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TITLE OF THE INVENTION 
NUCLEIC ACID BINDING OF MULTI-ZINC FINGER TRANSCRIPTION FACTORS 



CROSS-REFERENCE TO RELATED APPLICATIONS 
[0001] This application is a continuation of International Appln. PCT/EP00/05582 
(International Publ. No. WO 01/00864, published January 4, 2001), the contents of the entirety of 
which is incorporated by this reference, filed on June 9, 2000, designating the United States of 
America. 

TECHNICAL FIELD 

[0002] The invention relates to biotechnology generally, and more specifically to a 
method of identifying transcription factors. 

BACKGROUND 

[0003] Zinc fingers are among the most common DNA binding motifs found in 
eukaryotes. It is estimated that there are 500 zinc finger proteins encoded by the yeast genome and 
that perhaps 1 % of all mammalian genes encode zinc finger containing proteins. These proteins 
are classified according to the number and position of the cysteine and histidine residues available 
for zinc coordination. 

[0004] The CCHH class, typified by the Xenopus transcription factor mA (19), is the 
largest. These proteins contain two or more fingers in tandem repeats. In contrast, the steroid 
receptors contain only cysteine residues that form two types of zinc-coordinated structures with 
four (C 4 ) and five (C 5 ) cysteines (28). Another class of zinc fingers contains the CCHC fingers. 
The CCHC fingers, which are found in Drosophila, and in mammalian and retroviral proteins, 
display the consensus sequence C-X2-C-X4-H-X4-C (Refs. 7, 21, 24). Recently, a novel 
configuration of CCHC finger, of the C-X 5 -C-Xi 2 -H-X4-C type, was found in the neural zinc finger 
factor/myelin transcription factor family (Refs. 11, 12, 36). Finally, several yeast transcription 
factors Qii^h as GAT 4 and rH A4 eentain nn ntyp i ral C 6 zinc finge r structure that coordinates 2 

zinc ions (Refs. 9, 32). 

====ffiME51^^ usually found in multiple~copies (up to 37) per protein. These 

copies can be organized in a tandem array, forming a single cluster or multiple clusters, or they can 



I' *l 

be dispersed throughout the protein. Several families of transcription factors share the same overall 
structure by having two (or three) widely separated clusters of zinc fingers in their protein 
sequ ence. The first, the MBPs/PRDH-BFl iranscription_factor_fam^ 

Schnurri and Spalt genes (1, 3, 6, 14, 33). Both MBP-1 (also known as PRDH-BF1) and MBP-2 
contain two widely separated clusters of two CCHH zinc fingers. The overall similarity between 
MBP-1 and MBP-2 is 51%, but the conservation is much higher (over 90%) for both the N- 
terminal and the C-terminal zinc finger clusters (33). This indicates an important role of both 
clusters in the function of these proteins. In addition, the N-terminal and C-terminal zinc finger 
clusters of MBP-1 are very homologous to each other (3). 

[0006] The neural specific zinc finger factor 1 and factor 3 (NZF-1 and NZF-3), as well 
as the myelin transcription factor 1 (MyTl, also known as NZF-2), belong to another family of 
proteins containing two widely separated clusters of CCHC zinc fingers (11, 12, 36). Like the 
MBP proteins, different NZF factors exhibit a high degree of sequence identity (over 80%) 
between the respective zinc finger clusters, whereas the sequences outside of the zinc finger region 
are largely divergent (36). In addition, each of these clusters can independently bind to DNA, and 
recognizes similar core consensus sequences (11). NZF-3 binds to a DNA element containing a 
single copy of this consensus sequence but was shown to exhibit a marked enhancement in relative 
affinity to a bipartite element containing two copies of this sequence (36). This finding suggests 
that the NZF factors may also bind to reiterated sequences. However, the mechanism underlying 
the cooperative binding of NZF-3 to the bipartite element is currently unknown. 

[0007] The Drosophila Zfh-1 and the vertebrate 8EF1 proteins (also known as ZEB or 
AREB6) belong to a third family of transcription factors. This family is characterized by the 
presence of two separated clusters of CCHH zinc fingers and a homeodomain-like structure {see, 
FIG. lA)(Refs. 4, 5, 35). In 6EF1, the N-terminal and C-terminal clusters are also very 
homologous and were shown to bind independently to very similar core consensus sequences (10). 
Recently, it was shown that mutant forms of 8EF1 lacking either the N-terminal or the C-terminal 
cluster have lost their DNA binding capacity indicating that both c lusters are required for the 
binding of 5EF1 to DNA (31). The Evi-1 transcription factor was shown to contain 10 CCHH zinc 
fingers; seven zinc fin gersare^p resent in th ^ Nrtenninaljegion^-and^hree-zinc-fmg 
terminal region (22). With this factor the situation is different from the transcription factors 
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described above, because the two clusters bind to two different target sequences, which are bound 
simultaneously by full-length Evi-1 (20). Binding of full-length Evi-1 is mainly observed when the 
tw o target sequences are positioned in a^certain-relative-orientation—but-there-was-no-strict 
requirement for an optimal spacing between these two targets. 

[0008] Cell-cell adhesion is predominantly a necessity during cell differentiation, tissue 
development, and tissue homeostasis. The effect of disrupted cell-cell adhesion is displayed in 
many cancers, where metastasis and poor prognosis are correlated with loss of cell-cell adhesion. 
E-cadherin, a homophilic Ca 2+ -dependent transmembrane adhesion molecule, and the associated 
catenins are among the major constituents of the epithelial cell-junction system. E-cadherin exerts 
a potent invasion-suppressing role in tumor cell line systems (Refs. 46, 47) and in in vivo tumor 
model systems (Ref. 48). Loss of E-cadherin expression during tumor progression has been 
described for more than 15 different carcinoma types (49). Extensive analyses has made clear that 
aberrant E-cadherin expression as a result of somatic inactivating mutations of both E-cadherin 
alleles is rare and so far largely confined to diffuse gastric carcinomas and infiltrative lobular 
breast carcinomas (50, 51). Northern analysis and in situ hybridization studies revealed that 
reduced E-cadherin immunoreactivity in human carcinomas correlates with decreased mRNA 
levels (52-54). Analysis of mouse and human E-cadherin promoter sequences revealed a conserved 
modular structure with positive regulatory elements including a CCAAT-box and a GC[ ]-box, as 
well as two E-boxes (CANNTG) with a potential repressor role (Refs. 55, 56). Mutation analysis 
of the two E-boxes in the E-cadherin promoter demonstrated a crucial role in the regulation of the 
epithelial specific expression of E-cadherin. Mutation of these two E-box elements results in the 
up regulation of the E-cadherin promoter in dedifferentiated cancer cells, where the wild type 
promoter shows low activity (55, 56). 

SUMMARY OF THE INVENTION 
[0009] The invention relates to a method of identifying transcription factors involving 
providing cells with a nucleic acid sequ ence including a sequence CACCT (["the first 5 nucleotides 
of ]SEQ ID NO: 1) as bait for the screening of a library encoding potential transcription factors and 
.l^oiming^a-s pecificity-test-to-isolate^he^actorsi — Transcriptioinf ae^^ 
method include separated clusters of zinc fingers such as, for example, a two-handed zinc finger 
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transcription factor. At least one such zinc finger transcription factor, denominated "SIPl", induces 
tumor metastasis by down regulation of the expression of E-cadherin. Compounds interfering with 
SIPl activity can thus be used to pre_yent_tumoiiinyasion-and-metastasis, — 



[0010] The mechanism of DNA binding remains poorly understood for most of the 
previously identified complex factors. We have characterized the DNA binding properties of 
vertebrate transcription factors belonging to the emerging family of two-handed zinc finger 
transcription factors such as 8EF1 and SIPL SIPl is a member of this transcription factor family, 
which was recently isolated and characterized as a Smad-interacting protein (Ref. 34). The SIPl 
and 5EF1, a transcriptional repressor involved in skeletal development and muscle cell 
differentiation, belong to the same family of transcription factors. They contain two separated 
clusters of CCHH zinc fingers, which share high sequence identity (>90%). The DNA-binding 
properties of these transcription factors have been investigated. The N-terminal and C-terminal 
clusters of SIPl show high sequence homology as well, and according to the invention each binds 
to a 5'-CACCT sequence_([the first 5 nucleotides of ]SEQ ID NO:l). Furthermore, high affinity 
binding sites for full length SIPl and 5EF1 in the promoter regions of candidate target genes like 
Brachyury, oc4-integrin and E-cadherin, are bipartite elements composed of one CACCT sequence 
([the first 5 nucleotides of ]SEQ ID NO:l) and one CACCTG sequence (SEOIDNO:2) . No strict 
requirement for the relative orientation of both sequences was observed, and the spacing between 
them (also denominated as N) may vary from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . ., to at least 44 bp. 
For binding to these bipartite elements, the integrity of both SIPl zinc finger clusters is necessary, 
indicating that they are both involved in binding to DNA. Furthermore, SIPl binds as a monomer 
to a CACCT-X N -CACCTG site (SEQ ID NO: 1 and SEP ID NO:2 separated by X N \ by having one 
zinc finger cluster contacting the CACCT ([the first 5 nucleotides of ]SEQ ID NO:l), and the other 
zinc finger cluster binding to the CACCTG sequence (SEP ID NO:2) . 

[0011] This binding may be generalized to other transcription factors that contain 
separated clusters of zinc fingers and may be applied to other Smad-binding proteins. Moreover, 
the Smad-interacting protein SIPl shows high expression in E-cadherin-negative human carcinoma 



cell lines, resulting in down regulation of E-cadherin transcription. Conditional expression of SIPl 
in E-cadherin-t ^tiY_e_^ 

simultaneously induced invasion. Hence, SIPl can considered as a potent invasion promoter 
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molecule and compounds, such as anti-SEPl antibodies, small molecules specifically binding to 
SIP, anti-sense nucleic acids and ribozymes, which interfere with SIP1 production or activity can 
prevent tumor invasion aiidjnetastasis 

[0012] The invention thus includes a method of identifying transcription factors such as 
activators and/or repressors. The method comprises providing cells with a nucleic acid sequence at 
least comprising a sequence CACCT ([the first 5 nucleotides of ]SEQ ID NO:l) or AGGTG ([the 
first 5 nucleotides of ]SEQ ID NO:3) (preferably, twice the CACCT ([the first 5 nucleotides of] 
SEQ ID NO:l) sequence) as bait for the screening of a library encoding potential transcription 
factors and performing a specificity test to isolate the factors. 

[0013] In another embodiment, the bait comprises one of the sequences CACCT-N- 
CACCT ( a first SEQ ID NO:l and a second SEP ID NO:l separated by N ), CACCT-N-AGGTG 
(SEQ ID NO:r21 1 and SEP ID NO:3 separated by N \ AGGTG-N-CACCT (SEQ ID NG:3_and 
SEP ID NP: 1 separated by N) or AGGTG-N- AGGTG ( a first SEQ ID NQ:[41 3 and a second SEP 
ED NP:3 separated by N ) wherein N is a spacer sequence. The latter spacer sequence can vary in 
length and can contain any number of base pairs ("bp") from N=0 bp to N= at least 44 bp. Thus, 
for example, N can be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 
100, 200, 300 or 400 bp in length. 

[0014] The transcription factor(s) identified using a method according to the invention 
comprises separated clusters of zinc fingers such as, for example, two-handed zinc finger 
transcription factors. 

[0015] These sequences may originate from any promoter region, but preferably from 
the group (also referred to as "target genes") selected from Brachyury, a4-integrin, follistatin or E- 
cadherin. 

[0016] The invention includes the transcription factors obtainable by and produced by a 
method according to the invention. 

[0017] In another embodiment, the invention relates to a method of identifying, 
isolating, and/or producing compounds with an interference capability towards transcription 
factors, obtained as described herein. For example, the invention includes a method involving 
adding j^sample comprisin g a potential comppundJO-b.eidentifie^ 

nucleotide sequence comprising one of a first SEP ID NO:l and a second SEP ID NP:1 separated 
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by N , SEQ ID NO: [21 1 and SEP ID NO:3 separated by R SEQ ID NO:3 and SEP ID NO:l 
separated by N, or a first SEP ID NP:[4] 3 and a second SEP ID NO:3 separated by N, wherein N, 
in these sequences^ is a s pacer sequence as preyiously_deseribed,-(ii)-a-protein-capable-to-b 
nucleotide sequence, incubating the sample in the system for a period sufficient to permit 
interaction of the compound or its derivative or counterpart thereof with the protein and comparing 
the amount and/or activity of the protein bound to the nucleotide sequence before and after the 
addition. 

[0018] Comparison of the amount of protein bound to the nucleotide sequence before 
and after adding the test sample can be accomplished, for example, by using a gel band-shift assay 
or a filter-binding assay. As a next step the compound thus identified can be isolated and 
optionally purified and further analyzed according to methods known to persons skilled in the art. 
The protein in step a) (ii) can be any protein capable to bind the nucleotide sequence, but is 
preferably a Smad-interacting protein such as SIP 1. 

[0019] Compounds identified by the latter method are also part of the present 
invention. With the term 'compounds with an interference capability towards transcription factors' 
is meant compounds, which are able to modulate (e.g., to inhibit, to weaken, and/or to strengthen) 
the bioactivity of transcription factors. More specifically, the latter compounds are able to 
completely or partially inhibit the production and/or bioactivity of SIP1. Examples of such 
compounds are small molecules or anti-SIPl antibodies or functional fragments derived thereof 
specifically binding to SIP1 protein or anti-sense nucleic acids or ribozymes binding to mRNA 
encoding SIP1 or small molecules binding the promoter region bound by SIP 1. In this regard, the 
present invention relates to compounds that modulate regulation of E-cadherin expression by SIP 1. 
More specifically, the present invention relates to compounds that, via inhibiting SIP1 production 
and/or activity prevent the down-regulation of the expression of the target gene E-cadherin. In 
other words, the present invention relates to compounds that can be used as a medicament to 
prevent or treat tumor invasion and/or metastasis, which is due to the down-regulation of E- 
cadherin expression by SIP-1. Methods to produce and use the latter compounds are exemplified 
further. 

[0020] The inven tion also includes a test kit to perform the method comprising_alieasL 
(i) an nucleotide sequence comprising one of a first SEP ED NQ:1 and a second SEP ID NP:1 
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separated by K SEQ ID N0:[2] l and SEP ID N0:3 separated by N , SEQ ID N0:3 and SEP ID 
NP:1 separated by N , or a first SEP ID NP:r41 3 and a second SEP ID NP:3 separated by N. 
wherein N, in these seq uen^Jo^pace^ 
capable of binding the nucleotide sequence. 

[0021] In another embodiment, the invention concerns an alternative to the so-called 
"two hybrid" screening assay as disclosed in the prior art. Several means and methods have been 
developed to identify binding partners of proteins. This has resulted in the identification of a 
number of respective binding proteins. Many of these proteins have been found using so-called 
"two hybrid" systems. Two-hybrid cloning systems have been developed in several labs (Chien et 
al., 1991; Durfee et al., 1993; Gyuris et al., 1993). All have three basic components: Yeast vectors 
for expression of a known protein fused to a DNA-binding domain, yeast vectors that direct 
expression of cDNA-encoded proteins fused to a transcription activation domain, and yeast 
reporter genes that contain binding sites for the DNA-binding domain. These components differ in 
detail from one system to the other. All systems utilize the DNA binding domain from either Gal4 
or LexA. The Gal4 domain is efficiently localized to the yeast nucleus where it binds with high 
affinity to well-defined binding sites that can be placed upstream of reporter genes (Silver et al., 
1986). LexA does not have a nuclear localization signal, but enters the yeast nucleus and, when 
expressed at a sufficient level, efficiently occupies LexA binding sites (operators) placed upstream 
of a reporter gene (Brent et al, 1985). No endogenous yeast proteins bind to the LexA operators. 
Different systems also utilize different reporters. Most systems use a reporter that has a yeast 
promoter, either from the GAL1 gene or the CYC1 gene, fused to lacZ (Yocum et al., 1984). These 
lacZ fusions either reside on multicopy yeast plasmids or are integrated into a yeast chromosome. 
To make the lacZ fusions into appropriate reporters, the GAL1 or CYC1 transcription regulatory 
regions have been removed and replaced with binding sites that are recognized by the DNA- 
binding domain being used. A screen for activation of the lacZ reporters is performed by plating 
yeast on indicator plates that contain X-Gal (5-bromo-4-chloro-3-indolyl-p-D-galactoside); on this 
medium, yeast (in which the reporters are transcribed) produces beta-galactosidase an d turns blue. 
Some systems use a second reporter gene and a yeast strain that requires expression of this reporter 



to grow on a particular medium. These "selectable marker" genes^g^ly_encode.enzymesj:equired- 



for the biosynthesis of an amino acid. Such reporters have the marked advantage of providing an 



8 



election for cDNAs that encode interacting proteins, rather than a visual screen for blue yeast. To 
make appropriate reporters from the marker genes, their upstream transcription regulatory elements 
were replace^y^bir^ 



been used as reporters in conjunction with appropriate yeast strains that require their expression to 
grow on media lacking either histidine or leucine, respectively. Finally, different systems use 
different means to express activation-tagged cDNA proteins. 

[0022] In all current schemes, the cDNA-encoded proteins are expressed with an 
activation domain at the amino terminus. The activation domains used include the strong 
activation domain from Gal4, the very strong activation domain from the Herpes simplex virus 
protein VP 16, or a weaker activation domain derived from bacteria, called B42. The activation- 
tagged cDNA-encoded proteins are expressed either from a constitutive promoter, or from a 
conditional promoter such as that of the GAL1 gene. Use of a conditional promoter makes it 
possible to quickly demonstrate that activation of the reporter gene is dependent on expression of 
the activation-tagged cDNA proteins. 

[0023] It is clear from the foregoing that two-hybrid systems for finding binding 
proteins have been used in the past. However, although the conventional two hybrid system has 
proven to be a valuable tool in finding proteinaceous molecules that can bind to other proteins it is 
an artificial system. A characteristic of a two hybrid system is that a fusion protein is made 
consisting of a part of which binding partners are sought and a reporter part that enables detection 
of binding. For finding relevant binding partners, several criteria must be met of which one is of 
course the correct choice of the region in the protein where binding to other proteins occurs. 
Another criterion which is much more difficult if not impossible to predict accurately on forehand 
is obtaining correct folding of the region (i.e., a folding of the region sufficiently similar to the 
folding of the region in the natural protein). Correct folding depends on among other things, the 
actual amino acid sequence chosen for generating the fusion protein. Another factor determining 
the identification of relevant binding partners is the sensitivity with which binding can be detected. 
[0024] An alternative to the conventional two-hybrid system is also provided herein. 
SusTthe invention provides an in vivo method and kit for detecting interactions between proteins 
and the influence of other compounds on the interactjorL_as__si^^ 
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chimeric, or fused proteins. These two fused proteins each show, independent from one another, a 
weak affinity towards a nucleic acid sequence comprising one of a first SEP ID NO:l and a second 
SEP ID NO: 1 separated by N. SEP ID NO:[2] .LaDd-SEOmNQ;3^eparated-bv-N^ E0-IB-NQ^ 
and SEP ID NP:1 separated by N , or a first SEP ID NO: [41 3 and a second SEP ID NP:3 
separated by N, wherein N, in these sequences, is a spacer sequence as previously described. 
However, when both fused proteins are independently bound to the sequence, and the test proteins 
each available in each of two fused proteins are as a result thereof brought into close proximity, the 
binding affinity towards the nucleic acid sequence comprising one of a first SEQ ID NP:1 and a 
second SEP ID NP: 1 separated by N , SEQ ID NP:[21 1 and SEP ID NP:3 separated by N . SEQ ID 
NP:3 and SEP ID NP:1 separated by N , or a first SEP ID NP:f41 3 and a second SEP ID NP:3 
separated by N, wherein N, in these sequences, is a spacer sequence as previously described, 
becomes much stronger. If the two test proteins indeed are able to interact, they bring, as a 
consequence thereof, into close proximity the transcriptional activator's two domains. This 
proximity is sufficient to cause transcription, which can be detected by the activity of a marker 
gene located adjacent to the nucleic acid sequence comprising one of a first SEP ID NP:1 and a 
second SEP ID NP: 1 separated by R SEQ ID NO:[2] 1 and SEP ID NP:3 separated by N , SEQ ID 
NP:3 and SEP ID NP:1 separated by N. or a first SEP ID NP:[41 3 and a second SEP ID NP:3 
separated by N, wherein N, in these sequences, is a spacer sequence as previously described. 

[0025] In accordance herewith a method is provided for detecting an interaction 
between a first interacting protein and a second interacting protein comprising providing a suitable 
host cell with a first fusion protein comprising a first interacting protein fused to a DNA binding 
domain capable to bind a nucleic acid sequence comprising one of a first SEP ID NP:1 and a 
second SEP ID NP: 1 separated by N , SEQ ID NO: [21 1 and SEP ID NO:3 separated bvN , SEQ ID 
NP:3 and SEP ID NP:1 separated by N . or a first SEP ID NP:[4] 3 and a second SEP ID NP:3 
separated by N, wherein N, in these sequences, is a spacer sequence as previously described, 
providing the suitable host cell with a second fusion protein comprising a second interacting 
protein fused to a DNA binding domain capable to bind a nucleic acid sequence comprising one of 
^jgn^Trrm and a second SEP ID NP:1 separated by N , SEP ID NP:f21 1 and SEQ ID 
NO:3 separated bvK SEQ ID NO:3 and SEP ID NO: 1 separated by K or a fir^SEQJD^Q:[A] 3- 
and a second SEP ID NP:3 separated by N, wherein N, in these sequences, is a spacer sequence as 
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previously described, subjecting the host cell to conditions under which the first interacting protein 
and the second interacting protein are brought into close proximity and determining whether a 
detectable gene present in the h ost_cell_andJocated-adiacent-to4he-nucleic-acid-sequence-has-been 
expressed to a degree greater than expressed in the absence of the interaction between the first and 
the second interacting protein. 

[0026] As an example, it should be clear that, in case a binding partner (prey) for a 
specific protein (bait) has been identified, the first fusion protein containing the bait will for 
example bind to the sequence CACCT ([the first 5 nucleotides of ]SEQ ID NO:l) (or AGGTG 
([the first five nucleic acids of ]SEQ ID NO: 3)) of the sequence CACCT-N- AGGTG and [(SEQ ID 
NO:2)] that the second fusion protein containing the prey will bind to the sequence AGGTG, ([the 
first five nucleic acids of ]SEQ ID NO:3) (or CACCT ([the first 5 nucleotides of ]SEQ ID NO:l)[, 
respectively]) of the sequence CACCT-N-AGGTG [(SEQ ID NO:2)] so that transcription of a 
marker gene will occur. 

[0027] The present invention finally relates to the new sequences a first SEQ ID NO: 1 
and a second SEP ID NO:l separated bv N . SEQ ID NO:[2] l and SEP ID NO:3 separated by N . 
SEQ ID NP:3 and SEP ID NP:1 separated bv N. and a first SEP ID NP:[4] 3 and a second SEP 
ID NP:3 separated bv N, wherein N, in these sequences, is a spacer sequence as previously 
described, and to the use of the sequences, in addition to any other sequence at least comprising a 
sequence CACCT (SEP ID NP:1) . for the identification, via any method known by a person 
skilled in the art, of new target genes different from the already described target genes Brachyury, 
cc4-integrin, follistatin or E-cadherin. 

BRIEF DESCRIPTIPN PF THE FIGURES 
[0028] FIG. 1 is a schematic representation of Zfh-1, SIP1 and 5EF1, and alignment of 
the SIP1 and 6EF1 zinc fingers. (A) Schematic representation of mouse 8EF1 (1117 amino acids) 
and SIP 1 (1214 amino acids). The filled boxes represent CCHH zinc fingers, the open boxes are 
CCHC zinc fingers. The homeodomain-like dom ain (HD^ is Hppicteri as an ova l The percentage 
represents the homology between different domains. SIP1 polypeptides used in this study are 
-depicted-with-thefaoor dm^ ^ * I999)-(B-> 

Alignments of the amino acid sequences from zinc fingers of SIP 1 and 5EF1. Vertical bars indicate 



11 



sequence identity. The conserved cysteine and histidine residues forming the zinc fingers are 
printed in bold, and indicated by an asterisk. The residues in zinc fingers that can contact DNA are 
indicated with an arrow. ( C)^gnmentj)fihe_pm 

and of 5EF1 N zf3+nzf4 and 5EF1 C zf2+czf3, respectively, demonstrating intramolecular conservation 
of zinc fingers. 

[0029] FIG. 2 depicts possible DNA-binding mechanisms for SEP1. Model 1: SIP1 
binds DNA as a monomer. Model 2: SIP1 binds DNA as a dimer. 

DETAILED DESCRIPTION OF THE INVENTION 
[0030] The following definitions are set forth to assist in the understanding of various 
terms used herein. 

[0031] "Nucleic acid" or "nucleic acid sequence" or "nucleotide sequence" means 
genomic DNA, cDNA, double stranded or single stranded DNA, messenger RNA or any form of 
nucleic acid sequence known to one of skill in the art. 

[0032] The terms "protein" and "polypeptide" used in this application are 
interchangeable. "Polypeptide" refers to a polymer of amino acids (amino acid sequence) and does 
not refer to a specific length of the molecule. Thus, peptides and oligopeptides are included within 
the definition of polypeptide. Included within the definition are, for example, polypeptides 
containing one or more analogs of an amino acid (e.g., unnatural amino acids, etc.), polypeptides 
with substituted linkages, as well as other modifications known in the art, both naturally occurring 
and non-naturally occurring. The proteins and polypeptides described above are not necessarily 
translated from a designated nucleic acid sequence; the polypeptides may be generated in any 
manner, including for example, chemical synthesis, or expression of a recombinant expression 
system, or isolation from a suitable viral system. 

[0033] The polypeptides may include one or more analogs of amino acids, 
phosphorylated amino acids, or unnatural amino acids. Methods of inserting analogs of amino 
acids into a sequence ar e known in the art. The polypeptides may also inclu d e o ne or more l a bel s , - 
which are known to those skilled in the art. In this context, it is also understood that the proteins 
= mag = be == fa^ 

which retain biological activity, namely, the mature, processed form. This allows the construction 
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of chimeric proteins and peptides comprising an amino sequence derived from the mature protein, 
which is crucial for its binding activity. The other functional amino acid sequences may be either 
physical l y linked b y , for exam ple,^hemicaL^ 



DNA techniques well known in the art. 

[0034] The term "derivative", "functional fragment of a sequence" or "functional part 
of a sequence" means a truncated sequence of the original reference sequence. The truncated 
sequence (nucleic acid or protein) can vary widely in length; the minimum size being a sequence 
of sufficient size to provide a sequence with at least a comparable function and/or activity of the 
original sequence referred to, while the maximum size is not critical. In some applications, the 
maximum size usually is not substantially greater than that required to provide the desired activity 
and/or function(s) of the original sequence. Typically, the truncated amino acid sequence will 
range from about 5 to about 60 amino acids in length. More typically, however, the sequence will 
be a maximum of about 50 amino acids in length, preferably a maximum of about 30 amino acids. 
It is usually desirable to select sequences of at least about 10, 12 or 15 amino acids, up to a 
maximum of about 20 or 25 amino acids. 

[0035] The terms "gene(s)", "polynucleotide", "nucleic acid sequence", "nucleotide 
sequence", "DNA sequence" or "nucleic acid molecule(s)" as used herein refers to a polymeric 
form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers 
only to the primary structure of the molecule. Thus, this term includes double- and single-stranded 
DNA, and RNA. It also includes known types of modifications, for example, methylation, "caps" 
substitution of one or more of the naturally occurring nucleotides with an analog. 

[0036] A "coding sequence 1 ' is a nucleotide sequence, which is transcribed into mRNA 
and/or translated into a polypeptide when placed under the control of appropriate regulatory 
sequences. The boundaries of the coding sequence are determined by a translation start codon at 
the 5'-terminus and a translation stop codon at the 3'-terminus. A coding sequence can include, but 
is not limited to mRNA, cDNA, recombinant nucleotide sequences or genomic DNA, while 

introns may be present as well under certain circumstances. 

[0037] With "transcription factor" is meant a class of proteins that bind to a promoter 
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[0038] With "promoter" is meant an oriented DNA sequence recognized by the RNA 
polymerase holoenzyme to initiate transcription. 

[00391 With "RNA pdymenise!Lis_meant-a-m 
RNA complementary to the DNA template. 

[0040] With "holoenzyme" is meant an active form of enzyme that consists of multiple 
subunits. 

[0041] The term 'antibody' or 'antibodies' refers to an antibody characterized as being 
specifically directed against a transcription factor such as SIP- lor any functional derivative 
thereof, with the antibodies being preferably monoclonal antibodies; or an antigen-binding 
fragment thereof, of the F(ab') 2 , F(ab) or single chain Fv type, or any type of recombinant antibody 
derived thereof. Monoclonal antibodies can for instance be produced by a hybridoma liable to be 
formed according to classical methods from an animal's splenic cells, particularly of a mouse or rat 
immunized against SIP lor any functional derivative thereof, and of cells of a myeloma cell line, 
and to be selected by the ability of the hybridoma to produce the monoclonal antibodies 
recognizing SIP1 or any functional derivative thereof which have been initially used for the 
immunization of the animals. Monoclonal antibodies may be humanized versions of mouse 
monoclonal antibodies made by means of recombinant DNA technology, departing from the 
mouse and/or human genomic DNA sequences coding for H and L chains or from cDNA clones 
coding for H and L chains. Alternatively, the monoclonal antibodies may be human monoclonal 
antibodies. Such human monoclonal antibodies are prepared, for instance, by means of human 
peripheral blood lymphocytes (PBL) repopulation of severe combined immune deficiency (SCID) 
mice as described in International Patent Application PCT/EP 99/03605 or by using transgenic 
non-human animals capable of producing human antibodies as described in U.S. Patent 5,545,806, 
the contents of both of which are incorporated by this reference. Also, fragments derived from 
these monoclonal antibodies such as Fab, F(ab)' 2 and ssFv ("single chain variable fragment"), 
form part of the present invention provided that they have retained the original binding properties. 
Such fragments are commonly generated by, for instance, enzymatic digestion of the antibodies 
with papain, pepsin, or other proteases. It is well known to the person skilled in the art that 
monoclonal anti bodies, or fragments thereof, can be modified for various uses. The mtibadies.can- 
also be labeled with an appropriate label of the enzymatic, fluorescent, or radioactive type. 
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[0042] The terms 'small molecules' refer to, for example, small organic molecules, and 
other drug candidates, which can be obtained, for example, from combinatorial and natural product 
librarie s via methods well known in the art ^RandQm_peptide-libraries-consisting-Qf-all-possi^ 
combinations of amino acids attached to a solid phase support may be used to identify peptides 
that are able to bind to SIP1 or to the promoter region bound by SIP 1. The screening of peptide 
libraries may have therapeutic value in the discovery of pharmaceutical agents that act to inhibit 
the biological activity of SIP 1. 

[0043] The terms 'anti-sense nucleic acids' and 'ribozymes' refer to molecules that 
function to inhibit the translation of SIPlmRNA. Anti-sense nucleic acids or anti-sense RNA and 
DNA molecules act to directly block the translation of mRNA by binding to targeted mRNA and 
preventing protein translation. Ribozymes are enzymatic RNA molecules capable of catalyzing the 
specific cleavage of RNA. Ribozymes' mechanism of action involves sequence specific 
hybridization of the ribozyme molecule to complementary target RNA, followed by an 
endonucleolytic cleavage. Within the scope of the invention are engineered hammerhead motif 
ribozyme molecules that specifically and efficiently catalyze endonucleolytic cleavage of SIP1 
RNA sequences. Specific ribozyme cleavage sites within any potential RNA target are initially 
identified by scanning the target molecule for ribozyme cleavage sites (e.g., GUA, GUU and 
GUC). Once identified, short RNA sequences of between 15 and 20 ribonucleotides corresponding 
to the region of the target gene containing the cleavage site may be evaluated for predicted 
structural features such as secondary structure that may render the oligonucleotide sequence 
unsuitable. A candidate target's suitability may also be evaluated by testing its accessibility to 
hybridization with complementary oligonucleotides, using ribonuclease protection assays. Both 
anti-sense RNA and DNA molecules and ribozymes of the invention may be prepared, for 
example, by any method known in the art for the synthesis of RNA molecules. These include 
techniques for chemically synthesizing oligodeoxyribonucleotides well known in the art such as 
for example solid phase phosphoramidite chemical synthesis. Alternatively, RNA molecules may 
be generated by in vitro and in vivo transcription of DNA sequences encoding the antisense RNA 
molecule. Such DN A sequences may be incorporated into a wide variety of vectors that 



incorporate suitable RNA polymerase promoters such as the T7 or SP6 pol^gr^pi^moters. 
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Alternatively, antisense cDNA constructs that synthesize anti-sense RNA constitutively or 
inducibly, depending on the promoter used, can be introduced stably into cell lines. 

[0044] The mentioned a ntib,o,dies,_small-molecules -anti-sense-nueleic-acids— and 



ribozymes can be used as 'a medicament' to prevent and/or treat tumor invasion and/or metastasis 
via inhibiting the down-regulation of E-cadherin expression by SIP-1. Malignancy of tumors 
implies an inherent tendency of the tumor's cells to metastasize (invade the body widely and 
become disseminated by subtle means) and eventually to kill the patient unless all the malignant 
cells can be eradicated. Metastasis is thus the outstanding characteristic of malignancy. Metastasis 
is the tendency of tumor cells to be carried from their site of origin by way of the circulatory 
system and other channels, which may eventually establish these cells in almost every tissue and 
organ of the body. In contrast, the cells of a benign tumor invariably remain in contact with each 
other in one solid mass centered on the site of origin. Because of the physical continuity of benign 
tumor cells, they may be removed completely by surgery if the location is suitable. But the 
dissemination of malignant cells, each one individually possessing (through cell division) the 
ability to give rise to new masses of cells (new tumors) in new and distant sites, precludes 
complete eradication by a single surgical procedure in all but the earliest period of growth. It 
should be clear that the 'medicament' of the present invention could be used in combination with 
any other tumor therapy known in the art such as irradiation, chemotherapy or surgery. 

[0045] With regard to the above-mentioned small molecules, the term 'medicament ' 
relates to a composition comprising small molecules as described above and a pharmaceutically 
acceptable carrier or excipient (both terms can be used interchangeably) to treat diseases as 
indicated above. Suitable carriers or excipients known to the skilled man are saline, Ringer's 
solution, dextrose solution, Hank's solution, fixed oils, ethyl oleate, 5% dextrose in saline, 
substances that enhance isotonicity and chemical stability, buffers and preservatives. Other 
suitable carriers include any carrier that does not itself induce the production of antibodies harmful 
to the individual receiving the composition such as proteins, polysaccharides, polylactic acids, 
polyglycolic acids, polymeric amino acids and amino acid copolymers. 

The 'medicament' may be administered by any suitable method within the 
knowledge of the skilled man. The preferred route of administration is jtarenteralLy^In^arental- 
administration, the medicament of this invention will be formulated in a unit dosage injectable 
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form such as a solution, suspension or emulsion, in association with the pharmaceutical^ 
acceptable excipients as defined above. 

[0047] However, the dosa ge andmpAejQf.adm 
Generally, the medicament is administered so that molecule of the present invention is given at a 
dose between 1 ng/kg and 10 mg/kg, more preferably between 10 ^ig/kg and 5 mg/kg, most 
preferably between 0.1 and 2 mg/kg. Preferably, it is given as a bolus dose. Continuous infusion 
may also be used and includes continuous subcutaneous delivery via an osmotic minipump. If so, 
the medicament may be infused at a dose between 5 and 20 ng/kg/minute, more preferably 
between 7 and 15 ^ig/kg/minute. 

[0048] With regard to antibodies, anti-sense nucleic acids, and ribozymes, a preferred 
mode of administration of the 'medicament' for treatment is the use of gene therapy to deliver the 
above-mentioned molecules. Gene therapy means the treatment by the delivery of therapeutic 
nucleic acids to patient's cells. This is extensively reviewed in Lever and Goodfellow 1995; Br. 
Med 5w//.,51, 1-242 (Culver 1995); Ledley, F.D. Hum. Gene Ther. 6, 1129 (1995). To achieve 
gene therapy there must be a method of delivering genes to the patient's cells and additional 
methods to ensure the effective production of any therapeutic genes. Two general approaches exist 
to achieve gene delivery; these are non-viral delivery and virus-mediated gene delivery. 

[0049] The following examples more fully illustrate preferred features of the invention, 
but should not be construed to limit the invention in any way. 

EXAMPLES 

Characterization of nucleic acid sequences at least comprising a CACCT (SEP ID NO:l) 
sequence. 

- SIP1 and SEF1 bind to target sites containing one CACC T (SEP ID NO:l) 
sequence and one CACCT G (SEP ID NO:2) sequence 
[0050] The DNA binding properties of SIP1 were studied. SIP1, a recently isolated 
"Smad-interacting protein, belongs to the emerging family of two-handed zinc finger transcription 
factors (34). The org anization of SIP1 is very similar to that of SEF l^Jhg ^Qt.o^^ 
family. Both proteins contain two widely separated clusters of zinc fingers, which are involved in 
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binding to DNA. The amino acid sequence homology is very high (more than 90%) within these 
two zinc finger clusters, whereas it is less evident in the other regions. This finding suggests that 
both proteins would bind in an analo gous_fashion.to,similar-DNA-targets^-Indeed— SEP-l-as-welhas 
5EF1 bind with comparable affinities to many different target sites, which always contain two 
CACCT (SEP ID NO: H sequences. 

[0051] SIPIfs inhibits Xbra2 expression when over-expressed in the Xenopus embryo 
(34), and SIPl F s binds to the Xbral promoter by contacting two CACCT (SEP ID NO:!) 
sequences. Recent studies using Xenopus transgenic embryos have shown that 2.1 kb of Xbral 
promoter sequences suffice to express a reporter protein in the same domain as Xbra itself (17). 
However, a single point mutation within the downstream CACCT (SEP ID NP:1) site (Xbra-D) in 
the promoter that disrupts SIP1 binding (as seen in gel retardation assays) has a severe effect. 
Expression of the marker protein initiates earlier (i.e., at stage 9), and is now found at ectopic sites, 
for example, in the majority of ectodermal, mesodermal, and endodermal cells (17). This finding 
indicates that this nucleotide, which is located within the downstream CACCT (SEP ID NP:D 
site, is required for correct spatial and temporal expression of the Xbra2 gene. In addition, when a 
mutation is introduced in the upstream CACCT (SEP ID NP:1) sequence, we observed the same 
premature and ectopic expression of Xbral as for the mutation within the downstream CACCT 
(SEP ID NP:1) site. Therefore, mutations in either the downstream or upstream CACCT (SEP ID 
NP:D that are known to affect SIP1 or 5EF1 binding in EMS A, give the same phenotype in vivo, 
indicating that a Xenopus 5EFl-like protein participates in the regulation of the Xbral gene. In 
addition, these in vivo data support the conclusions from the in vitro binding experiments 
presented here: SIP 1/5EF1 -like transcription factors require two CACCT (SEP EDNP:1) sites for 
regulating the expression of the Xbral promoter. 

[0052] Not all promoter regions containing two CACCT (SEP ID NP:1) sequences 
represent SIP1 or 5EF1 binding sites. Notably, duplication of the Xbra-F probe, which contains the 
upstream CACCT (SEP ID NP:1) sequence present in the Xbra-WT element, is refractory to 
binding of either SIP1 or 8EF1. Moreov er, neither SIPInzf nor SIPIczf can bind efficiently to this 
site (Xbra-F) as monomer or as dimer. Thus other sequences in addition to CACCT (SEQ ID 
NOilLmayJb^ 

ID NP:2) is always a better target site for binding of these zinc finger clusters. Indeed, the high- 



18 



affinity CACCTG (SEP ID NO:2) site (Xbra-E) was shown to bind either the SIPInzf or the 
SIPIczf cluster. In addition, modification of the CACCTG (SEP ID NO:2) site into CACCTA 
strongl y affects the b indingj3f_SIPJFs-and-SEF-l-to4he-Xbra-p^ 

of this 3'-guanine residue. By comparing the sequence of all the S1P1 and 8EF1 target sites, a 
minimal consensus sequence was found composed of one CACCT (SEP ID NO:l) sequence and 
one CACCTG (SEP ID NP:2) sequence, demonstrating that these two sequences are sufficient to 
form a high-affinity binding site for SIP1 or 8EF1. 

[0053] Although the upstream CACCT (SEP ID NP:1) sequence is unable to bind 
SIPIczf or SIP Inzf, this sequence is contacted by full size SIP1 in the context of the Xbra-WT 
probe. The upstream CACCT (SEP ID NP:Q sequence is a prerequisite for the binding of SIPIfs 
to the Xbra-WT probe. Thus, when the upstream CACCT (SEP ID NP:1) sequence is combined 
with another, high-affinity CACCTG (SEP ID NP:2) site (Xbra-E), this low affinity site (Xbra-F) 
becomes committed to the binding of SIPIfs- A model in which SIPIfs contacts its target promoter 
via the binding of one of its zinc fingers clusters to a high affinity CACCTGf-] (SEP ID NP:2) 
sequence (e.g., Xbra-E) is favored, which is followed by the contact of the low affinity CACCT 
(SEP ID NP:1) site (Xbra-F) by the second cluster, and this additional interaction strongly 
stabilizes SIP1 binding. Therefore, a CACCT (SEP ID NP:Q site may still have an important 
function in the regulation of gene expression; while even on its own it neither binds SIPInzf, 
SIPIczf nor SIPIfs- 

[0054] The DC5 probe from the 81-crystallin enhancer binds 8EF1 specifically (31). 
However, this probe contains only one CACCT (SEP ID NP:11 sequence. Therefore, despite 
having demonstrated here that high affinity binding sites for 8EF1 should contain one CACCT 
(SEPIDNP:!) sequence and one CACCTG (SEP ID NP:2) sequence, it cannot be excluded that 
in particular cases, such as the DC5 probe, one CACCT (SEP IDNP:1) site would be sufficient 
for the binding of this type of transcription factor. 
- Mode ofSIPl DNA binding 

[00551 When tested inde pende n tly in F.MS A J hnth the P-t e rmin al as w e ll as the N 

terminal zinc finger clusters of SIP1 or 8EF1 bind to very similar CACCT[-] (SEQ ID NP:1) 

^ontaiai ag-eons^nsus-seyienecs. Doth fo r~SIM-and^£Fi7^Z^^mi^F4-sliare an exlensivi 
amino acid sequence homology with CZF2 and CZF3, respectively. This homology may explain 
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why these two clusters can bind to similar consensus sequences. In addition, it has been shown that 
SIP1 or 8EF1 require two CACCT (SEP ID NP:1) sequences for binding to several potential 
targeLsjtes._Based_on_these_results,Jt4s 

elements in such a way that one zinc finger cluster contacts one of the CACCT (SEP ID NO:!) 
sites, while the other cluster contacts the second CACCT (SEP ID NO:!) site (see, FIG. 2, "Model 
1"). An alternative model could be that SEP1 or 8EF1 homodimerizes before being able to bind to 
these target sites with high affinity ("Model 2"). The DNA binding capacity of SIP1 N zf is 
abolished by mutations in either NZF3 or NZF4. Similarly, mutations within CZF2 or CZF3 also 
affect the binding capacity of SIP 1 C zf. When these mutations are introduced in the context of the 
full size SIP1, binding of SIPIfs is no longer observed. This observation indicates that the binding 
activity of both zinc finger clusters is required for the binding of SIPIfs to its target element, 
containing a doublet of CACCT (SEP ID NP:Q sites. Similarly, it was previously shown that the 
integrity of both zinc finger clusters of 8EF1 is needed for binding DNA (31). These observations 
indicate that both zinc fingers clusters are directly contacting the DNA. Therefore, in the dimer 
model (FIG. 2, Model 2), the SIPl N zFof one SIP1 molecule should bind to one CACCT (SEP ID 
NP:1) sequence and the SIPIczf of the second SIP1 molecule should contact the other CACCT 
(SEP ID NP:P sequence. If such a dimer configuration exists, then it can be assumed that certain 
combinations of full size SIP1 molecules having different mutations within CZF or NZF, 
respectively, should allow for the formation of a functional dimer able to bind to its target DNA. 
None of the possible combinations of the four SIPIfs mutants tested (NZF3mut, NZF4mut, 
CZF2mut and CZF3mut) gave rise to a DNA/SIP1 complex in EMS As. This finding argues against 
the existence of SIP1 dimers. In addition, using differently tagged SIPIfs molecules, detection of 
SIP1 dimers in EMS As was not possible, nor to supershift such dimeric complexes with different 
antibodies. Therefore, support is provided for "Model 1" in which SIP1 binds as a monomer to a 
target site, which contains one CACCT (SEP ID NP:1) sequence and one CACCTG (SEQ ID 
NP:2) sequence. 

[0056] It has been shown herein that neither t he relative orientation of the two CACCT 
(SEP ID NP:1) sequences nor the spacing between these sequences is critical for the binding of 
JHHesJ^EF^ 

flexible secondary structure to accommodate the binding to these different target sites. The long 
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linker region between the two zinc finger clusters within SIP1 and 8EF1 may permit this flexibility 
in the secondary structure of these proteins. These transcription factors can bind to sites 
containin g CACCT (SEP ID NO:lY_ sequences-separated-bv-at-least-44-bp-(-Eead-W-T-)rSuggesting- 
that a region of about 50 bp of promoter sequences might be covered and therefore less accessible 
to transcriptional activators once SIPIfs or SEF1 is bound to this promoter. This indicates that 
SIP1 or 6EF1 could function as transcriptional repressor by competing with transcriptional 
activators that bind in this region covered by SIP1 or 8EF1 . 

- Other families of transcription factors may bind DNA with a similar mechanism 
as SI PI 

[0057] This new mode of DNA binding may also be generalized to other transcription 
factor families, which, like SIP1 and 8EF1, contain separated clusters of zinc fingers like those of 
the MBP/PRDH-BFl family (Refs. 1, 3, 6, 29, 33). As with SIP1 and 8EF1, the conservation of 
these zinc finger clusters is very strong between the different members of this family (1). In 
addition, the C-terminal cluster is very homologous to the N-terminal cluster and, in the case of 
PRDH-BFl, these clusters bind to the same sequences when tested independently (3). Therefore, 
this type of transcription factor may bind to two reiterated sequences through the contact of one 
zinc finger cluster with one sequence and the other cluster with the second sequence. Similarly, the 
different members of the NZF family of transcription factors also have two widely separated 
clusters of zinc fingers (Refs. 1 1, 12, 36). MyTl, NZF-1 and NZF-3 all bind to the same consensus 
element AAAGTTT (SEQ ID NO:[J4). Like for SIP 1 and 8EF1, showing a significantly higher 
affinity to elements containing 2 CACCT (SEP ID NO:!) sequences, an element containing 2 
AAAGTTT (SEP ID NP:41 sequences demonstrated a markedly higher affinity to NZF-3 (36). 
This suggests that 2 AAAGTTT (SEP ID NP:4) sequences are needed to create a high-affinity 
binding site for these transcription factors, and that they may bind DNA with a similar mechanism 
as SIP1 and 8EF1. Finally, the Evi-1 protein, which contains 7 zinc fingers at the N-terminus and 3 
zinc fingers at the C-terminus, binds to two consensus sequences. It binds to a complex consensus 
sequence (GACAAGATAAGATAA^^fiXTCAT-CTTC (SEP ID NPJ 6 1 5 )) v i a a me c hanism 
that may involve the binding of the N-terminal zinc finger cluster to the first part and the binding 
:5£3he^C^Irmi5aT^u ste^^ (20). hi c ui ii l usiuii, th e-mode of DNA-binding that is 
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described here may not only be applicable to the SIP1/5EF1 family of transcription factors, but 
appears to be more universal. 

[0058] SIP 1 wa s cloned_as,a.Smad Uinteracting-protein-but-was-also-shown-to-interact- 

with Smad2, 3 and 5 (34). Smad proteins are signal transducers involved in the BMP/TGF-p 
signaling cascade (13). Upon binding of TGF-p ligands to the serine/threonine kinase receptor 
complex, the receptor-regulated Smad proteins are phosphorylated by type I receptors, and migrate 
to the cell nucleus where they modulate transcription of target genes. The interaction between SIP1 
and Smads has only been observed upon ligand stimulation, indicating that Smads need to be 
activated before they are capable of interacting with SIP1 (34). Surprisingly, Evi-1, a transcription 
factor that may bind DNA with a similar mechanism as SIP1, is a Smad3 -interacting protein (15). 
So far, it was shown that Evi-1 inhibited the binding of Smad3 to DNA, but certainly has an effect 
on target promoters of Evi-1. Schnurri, which is the Drosophila homologue of the human PRDII- 
BF1 transcription factor, is a protein that may also bind DNA with a similar mechanism as SIP1 
protein. Interestingly, Schnurri was proposed to be a nuclear protein target in the dpp-signaling 
pathway (1, 6). Dpp is a member of the TGF-p family. This makes Schnurri a candidate nuclear 
target for Drosophila Mad protein, the Drosophila homologue of vertebrate Smads. Therefore, the 
mode of DNA binding employed by SIP1 can be generalized to other zinc finger containing Smad- 
interacting proteins, and represents a common feature of several Smad partners in the nucleus. 

[0059] These results demonstrate a novel mode of DNA binding for 5EF1 family of 
transcription factors. This mode of DNA binding is also relevant to other families of transcription 
factor that contains separated clusters of zinc fingers. 

Materials and methods 
Plasmid constructions. 

[0060] For expression in mammalian cells, the SIP1 (34) and 5EF1 (5) cDNAs were 
subcloned into pCS3 (27). In this plasmid, the SIP1 and 5EF1 open reading frames are fused to a 
(Myc) 6 tag at the N-terminus. SIP1 cDNA was also cloned into pCDNA3 (In vitrogen) as an N- 
terminal fusion with the FLAG tag. For the expression of SIPInzf and SIPIczf, we sub-cloned into 
4iC£3-t hexDNA-fra^ 

(as amino acids 957 to 1 156) and SIPInzf (amino acids 90 to 383) were also produced in E. coli as 
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a GST fusion protein (in pGEX-5X-l, Pharmacia) and purified using the GST purification module 
(Pharmacia). Identical mutations to those made in AREB6 (10) were also introduced in the SIP1 
zinc fin gers. Muta g enesis of zinc fin gersJ^ZF3,,_NZM,^^ 

their third His to a Ser. These mutations were introduced using a PCR based approach with the 
following primers: 

SIPlNZF3Mut, 5 '-CCACCTGAAAGAATCCCTGAGAATTCAC AG (SEQ ID NO:[7]6); 
Sn>l N zF4Mut, 5 ' -GGGTCCT AC AGTTC ATCT ATC AGC AGC AAG (SEQ ID NO:[8]7); 
SIPlczF2Mub 5'-CACCACCTTATCGAGTCCTCGAGGCTGCAC (SEQ ID NO:[9]8); 
SIPlczF3Mub 5 ' -TCCT ACTCGC AGTCC ATG AATC AC AGGT AC (SEQ ID NO:[10]9}. 

[0061] The respective mutated clusters were re-cloned in full size SIP1 in pCS3 in 
order to produce in mammalian cells the mutated SIP1 proteins named NZF3mut, NZF4mut, 
CZF2mut and CZF3mut, respectively. Furthermore, these mutated clusters were sub-cloned into 
pGEX5-X2 (Pharmacia), and produced in E. coli as a GST fusion protein (GST-NZF3mut, GST- 
NZF4mut, GST-CZF2mut and GST-CZF3mut). All constructs were confirmed by restriction 
mapping and sequencing. 

Cell culture and DNA transfection. 

[0062] COS1 cells were grown in DMEM supplemented with 10% fetal bovine serum. 
Cells were transfected using Fugene according to the manufacturer's protocol (Boehringer 
Mannheim), and collected 30-48 hrs after transfection. 

Gel retardation assay. 

[0063] The Xbra-WT oligonucleotide covers the region from -344 to -294 of the 
Xbra2 promoter (16). The region between -412 to -352 of the ct4-integrin promoter is present 
within the a4I-WT oligonucleotide (26). The Ecad-WT probe contains the region between -86 to - 
17 of the human Ecad promoter (2). The sequences of the upper strand of the wild types and 
mutated double-stranded probes are listed in Table 1. Double-stranded oligonucleotides were 
labele d with [ 32 P]-y-ATP and T4 polynucleotide kinase Osfe w E ngland ftinlah c) Total cell e yfr acte- 
were prepared from COS1 cells (25) transfected with different pCS3 vectors allowing synthesis of 
;fulHength^ 

equal amounts of Myc-tagged SIP1 and FLAG-tagged SIP1. GST-SIP 1 fusion proteins were 
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purified from E. coli extract using the GST purification module (Pharmacia), and tested in gel 
retardation. The DNA binding assay (20 was performed at 25°C, with 1 jag of COS1 total cell 
protein^J^^ 

Cerenkov counts) in the 8EF1 binding buffer described previously (30). For supershift 
experiments, the extracts were incubated with anti-Myc (Santa Cruz) or anti-FLAG (Kodak) 
antibodies. For competition, an excess of unlabeled double-stranded oligonucleotides was added 
together with the labeled probe. The binding reaction was loaded onto a 4% polyacrylamide gel 
(acrylamide/bis-acrylamide, 19:1) prepared in 0.5XTBE buffer. Following electrophoresis, , gels 
were dried, and exposed to X-Ray film. All experiments were repeated at least three times. 

Methylation interference assay. 

[0064] The upper and the lower strands of the Xbra-WT probe were labeled separately 
and annealed with excess of complementary DNA strand. The probes were precipitated and treated 
with di-methyl-sulfate (8). The methylated probe (10 5 Cerenkov counts) was incubated in a 10 X 
gel retardation reaction (see above) (200 (xl final volume) with 10 ^g of total cell extract from 
COS1 cells expressing either SIPIfs or SIPIczf- After 20 min. of incubation at 25°C, the products 
were loaded onto a 4% polyacrylamide gel, and electrophoresis was performed as for the gel 
retardation assay. Subsequently, the gel was blotted onto DEAE-cellulose membrane; the transfer 
was performed at 100 V for 30 min. in 0.5XTBE buffer. The membrane was then exposed for one 
hour, and the bands corresponding to the SIPIfs (or SIPIczf) and the free probe were eluted at 
65°C, using high salt conditions (1M NaCl, 20 mM Tris, pH7.5, 1 mM EDTA). The eluted DNA 
was precipitated and treated with piperidine (18). After several cycles of solubilization in water 
and evaporation of the liquid under vacuum, the resulting DNA pellet was dissolved in 10 ^1 of 
sequencing buffer (97.5 % de-ionized formamide, 0.3 % each bromophenol blue and xylene 
cyanol, 10 mM EDTA) and denatured for 5 min. at 85 °C. The same amount of counts (1,500 
Cerenkov counts) for the free probe and the bound probe was loaded onto a 20% polyacrylamide- 
8M urea sequencing gel. The gel was run in 0.5XTBE for one hour at 2,000 V. Thereafter, the gel 
was tixed in 50% methanol/ 10% acetic acid and dried. The gel was then exposed for 
autoradiography. _ _ - 
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Western blot analysis. 

[0065] Transfected cells were washed with PBS-0 (137 mM NaCl, 2.7 mM KC1, 6.5 
mM-Na2HE04 ) _1.5-mM_KH2E0 4 ) 5 -collected4n-detachment-bu^^ 

EDTA, 10% glycerol, with protease inhibitors (Protease inhibitor Cocktail tablets, Boehringer 
Mannheim)) and pelleted by low spin centrifugation. The cells were then solubilized in 10 mM 
Tris, pH 7.4, 125 mM NaCl, 1% Triton X-100. For direct electrophoretic analysis, gel sample 
buffer was added to the cell lysates and the samples were boiled. For other experiments, lysates 
were first subjected to immunoprecipitation with either anti-Myc or anti-FLAG antibodies. 
Antibodies were added to aliquots of the cell lysates, which were incubated overnight at 4°C. The 
antibodies and the bound protein(s) of the cell lysate were coupled as a complex to protein A- 
Sepharose for 2 hours at 4 °C. The immunoprecipitates were washed 4 times in NET buffer (50 
mM Tris pH 8.0, 150 mM NaCl, 0.1% NP40, 1 mM EDTA, 0.25% gelatin), resolved by SDS- 
polyacrylamide (7.5%) gel electrophoresis, and electrophoretically transferred to nitrocellulose 
membranes. Membranes were blocked for 2 hours in TBST (10 mM Tris pH 7.5, 150 mM NaCl, 
0.1 % TWEEN-20) containing 3% (w/v) non-fat milk, and incubated with primary antibody 
(lM-g/ml) for 2 hours, followed by secondary antibody (0.5 ng/ml) linked to horseradish 
peroxidase. Immunoreactive bands were detected with an enhanced chemiluminescence reagent 
(NEN). 

Xenopus laevis transgenesis and whole-mount in situ hybridization 

[0066] Xenopus embryos transgenic for Xbra2-GFP were generated as described 
previously (Kroll and Amaya, 1996), with the following modifications. A Drummond Nanoinject 
was used for injecting a fixed volume of 5 nl of sperm nuclei suspension per egg, at a theoretical 
concentration of 2 nuclei per 5 nl. Notl was used for plasmid linearization and nicking of sperm 
nuclei. Approximately 800 eggs were injected per egg extract incubation. The procedure resulted 
in a successful cleavage of the embryo with rates between 10% and 30%. Of these, 50 to 80 % 
completed gastrulation, and 20 to 30% developed further into normal swimming tadpoles, if 
allowed. The transgenic frequency, as analyzed by expression, varied between 50 to 90%. Embryos 
were stag ed accordin g to Niewkoop and Faber (1967V A minimum of 30 exp ressing embryoswere. 
analyzed per construct and shown stage. Whole-mount in situ hybridization for the GFP reporter 
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gene was as described previously (Latinkic et al., 1997). After color detection, embryos were 
dehydrated and cleared in a 2:1 mixture of benzyl alcohol/ benzyl benzoate. 

[0067] Table 1 lists th e probes_used_herein.-(TSee,~alsol iS , ge-a/^— the-Fsequence 

listing] Sequence Listing , which is incorporated herein). The "Spacing" column is the number of 
nucleotides present between two CACCT (SEP ID NO:l) sequences. In the corresponding Table 1 
of the incorporated parent PCT International Patent application, the CACCT (SEP ID NO: 1) 
sequences are highlighted in bold. In that Table, the underlined gaps correspond to deletions of 
nucleotides from the wild type probes. For some probes, only the residues that were changed in 
comparison to the wild type probes were indicated in order to facilitate interpretation of the 
introduced mutations. 



TABLE 1. 






OLIGO 


SEQUENCE 


SPACING 


Xbra-WT 


SEQIDNO:[11]10 


24 


Xbra-D 


SEQIDNO:[12]H 




Xbra-E 


SEQIDNO:[13]12 




Xbra-F 


SEQ ID NO:[14]13 




Rdm + Xbra-E 


SEQEDNO:[15]14 




Xbra-F + AREB6 


SEQIDNO:[16]i5 


23 


Rdm + AREB6 


SEQIDNO:[17]!6 




Xbra-J 


SEQ ID NO: [18] 17 




Xbra-K 


SEQEDNO:[19]i8 




Xbra-L 


SEQIDNO:[20]i9 




Xbra-M 


SEQIDNO:[21]20 




Xbra-N 


SEQ ID NO:[22]2_i 




Xbra-0 


SEQ ID NO:[23]22 




Xbra-P 


SEQ ID NO:[24]23 




Xbra-Q 


SEQIDNO:[25]24 




Xbra-R 


SEQ ID NO:[26]25 




Xbra-S 


SEQIDNO:[27]26 




Xbra-Z 


SEQ ID NO:[28]27 




Xbra-B 


SEQ ID NO:[29]28 


21 


Xbra-C 


SEQ ID NO:[30]29 


21 


Xbra-U 


SEQIDNO:[31]30 


14 


Xbra-EE 


SEQ ID NO:[32]3J. 


18 

26 


Yhra-FrF. 
Xbra-FrF 


SEQ ID NO:[33]32 

SEQ ID NO:[34]33 


24 


Xbra-V 


SEQ ID NO:[35]34 


24 


A-bra-W 


— SEQ-mWrf36ji5 


■ 24 


«4I-WT 


SEQ ID NO:[37]36 


34 


«4I-A 


SEQ ID NO:[38]37 





26 



a 4I-B SEQIDNP:[39]38 

Ecad-WT SEQ ID NO:[40]39 44 

Ecad-A SEQIDNO:[41]40 

Ecad-B SEQJD.N.Q: [_42]41 

Further materials and methods: 

[0068] Gel retardation assay with different probes from the Xbra2 promoter: The 
different Xbra 32 P labeled probes (10 pg) were incubated with 1 ng of total protein extract from 
COS1 cells transfected with pCS3-SIPlczF, with pCS3-SIPl F s or from mock-transfected cells. 

[0069] Two CACCT (SEP ID NO:U sites are contacted upon binding of SIPl F s to the 
Xbra2 promoter: Only mutations within the upstream CACCT (SEP ID NO:!) sequence (as 
revealed by scanning mutagenesis, see Table I) or the downstream CACCT (SEP ID NP:1) 
sequence of Xbra- WT abolish SIPIfs binding. Methylation interference assay indicates that SIPIfs 
contacts both CACCT (SEP ID NP:1) sequences. Xbra-WT either labeled in the upper or the 
lower strand were methylated and incubated with total extract from CPS1 cells transfected either 
with pCS3-SIP1fs or pCS3-SIPlczF. The DNA retarded in the shifted complex or the unbound 
DNA (FREE) were purified, cleaved with piperidine and run onto a sequencing gel. Guanine 
residues are methylated in the free probe. The upstream and the downstream CACCT (SEP ID 
NP:1) from the Xbra2 promoter are indicated. 

[0070] Two CACCT (SEP ID NP:1) sequences are necessary for the binding of SIPIfs 
and 8EF1 to the Xbra2, the a4-integrin and the E-cadherin promoters: 5EF1 binding to the Xbra2 
promoter; SIP1 and 8EF1 binding to the a4-integrin promoter.; binding of SIP 1 and 8EF1 to the 
ot4-integrin promoter, including competition with excess of non-labeled wild type and mutated 
binding sites; binding of SIP1 and 5EF1 to the E-cadherin promoter. In each binding reaction, 10 
pg of labeled probes were incubated with 1 ^g of a total cell protein extract prepared from CPS1 
cells transfected with either pCS3-SIP1fs or pCS3-5EFl. In the competition experiments, 5 ng and 
50 ng of unlabeled DNA were added at the same time as the labeled probe. Myc-tag directed 
antibody was added to the binding reaction and the supershifted complex. 5EF1 and the SIP1 
retarded complexes were demonstrated. For the sequences of all probes, see Table 1 and the 
sequence listing. — 
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[0071] The spacing and the relative orientation of the CACCT (SEP ID NO:l) 
sequences are not critical for the binding of SIPIfs and 5EF1 to the Xbra2 promoter: Ten pg of 
labeled-probes-wereJncubated^ 

transfected with either pCS3-SIPl F s or pCS3-5EFl. We used 10 pg of the Xbra-E probe and 10 pg 
of the Xbra-F probe in the same binding reaction. For reasons of clear and comparative 
presentation, we omitted the free probe from the SIP1 binding reactions. 

[0072] The integrity of both SIP1 zinc finger clusters is necessary for the binding of 
SIPIfs to DNA: Mutations within NZF3, NZF4, CZF2, CZF3 abolish the DNA-binding activity of 
either the SIPInzf or SIPIczf zinc finger clusters. The wild type and mutated zinc finger clusters 
were fused to GST and the fusion proteins were produced in E. coli. After purification, an equal 
amount of each fusion proteins (0.1 ng) was incubated with lOpg of labeled Xbra-E probe. 
Mutations within NZF3, NZF4, CZF2 or CZF3 affect the binding of SIPIfs to the Xbra-WT probe. 
Ten pg of labeled Xbra-WT probe were incubated with 1 ng of a total cell protein extract prepared 
from COS1 cells transfected with either pCS3-Sn>l F s, pCS3-SIPl N zF3mut, pCS3-SIPlNZF4mut, pCS3- 
SIPlczF2mut or pCS3-SIPl C zF3mut. All possible combinations of 2 COS cell extracts (1 jag of each) 
expressing different of SDP1 mutants were tested. Myc-tag directed antibody was added to the 
binding reaction and the supershifted complex and the SIPIfs retarded complex are indicated. 
Mutations within NZF3, NZF4, CZF2 or CZF3 abolish the binding of SIPIfs to the oc4-integrin 
promoter. Ten pg of labeled oc4I -WT probe were incubated with 1 jig of a total cell protein extract 
prepared from COS1 cells transfected with either pCS3-SEPl F s[ I pCS3-SIPlNZF3mut[ ], pCS3- 
SIPInzfw, pCS3-SIPl C zF2mut a [ ]or pCS3-SIPlczF3mut. Myc-tag directed antibodies were added to 
the binding reaction and the supershifted complex and the SIPIfs retarded complex are indicated. 
SIP1 mutants are produced in comparable amounts in COS cells. Ten ^g of the COS cell total 
extract were analyzed by Western blotting using the anti-Myc antibody. SDPl mutant expression 
levels are in fact slightly higher that SIP1-WT expression level. 

[0073] [-]SIPl F s binds as a monomer to the Xbra-WT probe. 

[0074] 10 pg of labeled Xbra-W T probe were incubated with 1 ^g of total cell protein 
prepared from COS1 cells transfected with an equal amount of pCS3-SIP1fs (Myc-tagged) and of 
pCDNA3-SI£l-(Flag-tagg^^ 
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anti-Flag and anti-Myc antibodies were added to the binding assay. The Flag- and the Myc- 
supershifted complexes are indicated. 

[0075] [--]T-he4ntegr^ 

[0076] SIPIfs binding to a gel-purified fragment derived from the multiple CACCT- 
containing artificial promoter from reporter plasmid p3TP-Lux. Anti-Myc tag antibody were 
added: the supershifted complex is indicated. Co-transfection assay of pCSS-SIPIfs, pCS3-CZF3- 
Mut or pCS3-NZF3-Mut together with the p3TP-Lux reporter vector is conducted. The activity is 
expressed in percentage of full SIPIfs repressor activity, which is 100%. 

[0077] Ectopic activity of the mutated Xbra2 promoter variants (Xbra2-Mut) in 
transgenic frog embryos: SIPIfs binding to the wild-type and mutated Xbra2 promoter elements. 
Whole-mount in situ hybridization for GFP mRNA of Xenopus embryos transgenic for a wild-type 
or point-mutated 2.1kb Xbra2 promoter fragment driving a GFP reporter. All embryos were fixed 
at stage 1 1 and cleared for better visualization of the signal. Percentages are indicative of 
intermediary phenotype (i.e., 35% of transgenic embryos displayed the normal Xbral expression 
pattern and 65% showed ectopic expression). has a structure similar to a-EFlhas a structure similar 
to a-EFlhas a structure similar to a-EFlhas a structure similar to a-EFlhas a structure similar to a- 
EFlhas a structure similar to a-EFl 

[0078] SIP1 was recently isolated as a Smad-binding protein. It binds Smadl, Smad 5 
and Smad2 in a ligand-dependent fashion (in BMP and activin pathways) (34). SIP1 is a new 
member of the family of two-handed zinc finger/homeodomain transcription factors, which 
includes vertebrate 5EF1 and Drosophila Zfh-1(4, 5). Like these, SIP1 contains two widely 
separated zinc finger clusters. One cluster of four zinc fingers (3 CCHH and 1 CCHC fingers) is 
located at the protein's N-terminal region and another cluster of three CCHH zinc fingers is 
present at the C-terminal region (FIG. 1A). Between SIP1 and 5EF1, a high degree of sequence 
identity is apparent within the N-terminal zinc finger cluster (87 %), and the C-terminal zinc finger 
cluster (97%) {see, FIG. IB), whereas the two proteins are less conserved in the regions outside the 
zinc finger clusters (34). Th erefore, we assumed that SIP1 and 5EF1 would bind to very similar 
sequences. In addition, the N-terminal and C-terminal zinc finger clusters of 5EF1 bind to very 
-simtiar~secmenees-whi^ 

Within the N-terminal cluster, both 5EF1 N zf3 and 5EF1 NZ F4 are the main determinants for binding 
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to the CACCT (SEP ID NO:l) consensus sequence, and 8EF1 C zf2 and 5EF1 C zf3 are required for 
the binding of the C-terminal cluster (10). Moreover, the 5EF1 N zf3+nzf4 domain shows high 
homology (67-%)-with-the-5E 

to similar consensus target sites on DNA (FIG.1C). All the residues essential for binding, and 
which are conserved between 5EF1 N zf3+nzf4 and 8EF1czf2-k:zf3 5 are also conserved between 
SIP1nzf3+nzf4 and SIP1c2F2+czf3. Taken together, these comparisons suggest that the N- and C- 
terminal zinc finger clusters of SIP1 would also bind to very similar target sequences. 

[0079] Two CACCT (SEP ID NO:l) sites are necessary for the binding of SIP1 to the 
Xbra2 promoter. [: CACCT sites are necessary for the binding of SIP1 to the Xbra2 promoter. 
CACCT sites are necessary for the binding of SIP1 to the Xbra2 promoter CACCT sites are 
necessary for the binding of SIP1 to the Xbra2 promoter CACCT sites are necessary for the 
binding of SIP1 to the Xbra2 promoter CACCT sites are necessary for the binding of SEP1 to the 
Xbra2 promoter CACCT sites are necessary for the binding of SIP1 to the Xbra2 promoter 
CACCT sites are necessary for the binding of SIP1 to the Xbra2 promoter CACCT sites are 
necessary for the binding of SIP1 to the Xbra2 promoter CACCT sites are necessary for the 
binding of SIP1 to the Xbra2 promoter CACCT sites are necessary for the binding of SIP1 to the 
Xbra2 promoter CACCT sites are necessary for the binding of SIP1 to the Xbra2 promoter 
CACCT sites are necessary for the binding of SIP1 to the Xbra2 promoter CACCT sites are 
necessary for the binding of SIP1 to the Xbra2 promoter CACCT sites are necessary for the 
binding of SIP1 to the Xbra2 promoterCACCT sites are necessary for the binding of SIP1 to the 
Xbra2 promoterCACCT sites are necessary for the binding of SIP1 to the Xbra2 promoterCACCT 
sites are necessary for the binding of SIP1 to the Xbra2 promoter 1 CACCT sites are necessary for 
the binding of SIP1 to the Xbra2 promoterCACCT sites are necessary for the binding of SIP1 to 
the Xbra2 promoterCACCT sites are necessary for the binding of SIP1 to the Xbra2 
promoterCACCT sites are necessary for the binding of SIP1 to the Xbra2 promoterCACCT sites 
are necessary for the binding of SIP1 to the Xbra2 promoterCACCT sites are necessary for the 
binding of SEP1 to the Xbra2 promoter]_CACCT sites are necessary for the binding of SIP1 to the 
Xbra2 promoterCACCT sites are necessary for the binding of SIP1 to the Xbra2 promoterSIPl 

overexpressed in the Xenopus embryo (34). The Xbra2 promoter contains several CACCT (SEQ 
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ID NO:!) sequences, two of which are localized in a region (-381 to -231) necessary for the 
induction by activin (16). These two sites, an upstream CACCT (SEP ID NO:P and a downstream 
AGGT G (SEP IDJSrO^H r^ 

are separated by 24 bp. To further elucidate the binding requirements of SIP1 to these sites, a 
corresponding 50 bp-long oligonucleotide (Xbra-WT) was used as a probe in electrophoretic 
mobility shift assays (EMSAs). The Xbra-D probe, that contains a mutation of the downstream 
AGGTG (SEP ID NP:3) site to AGATG, was included also. A similar mutation was previously 
shown to abolish the binding of 5EF1 to the kE2 enhancer (30). In addition, we also tested the 
downstream site (probe Xbra-E) and the upstream site (probe Xbra-F) independently as shorter 
probes. These probes were incubated with total extracts of CPS cells expressing the Myc-tagged 
C-terminal zinc finger cluster of SEP1 (SIPIczf), the Myc-tagged N-terminal zinc finger cluster of 
SIP1 (SIPInzf), or Myc-tagged full size SIP1 (SIPIfs). 

[0080] When mock-transfected CPS cells are used as control with the A probe, two 
weak complexes and one strong complex are visualized. Using competitor oligonucleotides, the 
two weak complexes turned out to be non-specific, whereas the strong, fast migrating complex 
shows specificity for binding to the Xbra probe. The latter observation suggests that CPS cells 
contain an endogenous protein that can bind to the Xbra-WT probe. When SIPIczf is present in the 
extract, we observed a strong and slow migrating complex, in addition to the endogenous binding 
activity from the CCS extract. This complex could be supershifted with an anti-Myc antibody, 
which confirms that it results from binding of SIPIczf to the Xbra-WT probe. Mutation of the 
downstream site (Xbra-D probe) strongly affected the formation of this SIPIczf complex. 
Moreover, SIPIczf binds to the Xbra-E probe, but not to the Xbra-F probe indicating that the 
downstream site is essential for binding of SIPIczf, and SIPIczf may exclusively bind to this site. 
The strong complex visualized with the Xbra-F probe was also present in SIPIfs extracts and in 
mock extract, and originates from hitherto uncharacterized endogenous CCS cells protein binding 
to the Xbra-F probe. In addition, CPS cell extracts containing SIPInzf displayed similar binding 
patterns in EMSAs as obtained with SIPIczf- It is apparent that, like in 5 EF1 (10), both zinc finger 
clusters of SIP 1 have similar DNA binding features. 
[0081] A strong complex , correspondjngio_SIRl4s4^ 

probe. It should be noted that the SIPIczf production level in CPS cells is approximately 50-fold 
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[0081] A strong complex, corresponding to SIPIfs, is also generated with the Xbra-WT 
probe. It should be noted that the SIPIczf production level in COS cells is approximately 50-fold 
higher_than_the_SIP-l.FsJevel.-Eor-each-EMSA~reaction 5 -w 

cell proteins. The binding of SIPIfs to Xbra-WT probe is as strong as the binding of SIPIczf- 
Interestingly, this indicates that the affinity of SIPIfs for Xbra-WT is at least 50 times higher than 
this ofSIPlczF- 

[0082] The SIPIfs complex, similar to SIPIczf and SIPInzf, is absent when using the 
mutated Xbra-D probe. Thus, an intact downstream site is again required for the binding of SIPIfs- 
In contrast to SIPIczf and SIPInzf, which bind with similar affinities to the Xbra-WT and Xbra-E 
probes, SIPIfs does not bind to the Xbra-E probe. Like SIPIczf and SIPInzf, SIPIfs does not bind 
to the Xbra-F probe. We conclude that the downstream site (AGGTG (SEP ID NO:3Y ) is necessary 
for SIPIfs to bind to the Xbra2 promoter. However, this site is not sufficient because additional 
sequences upstream of the Xbra-E probe are necessary for the binding of SIPIfs- One of the 
reasons for which SIPIfs was unable to bind to the Xbra-E probe may simply be the length of the 
Xbra-E probe, because it is shorter than the Xbra-WT probe. To test this, we prepared a probe 
containing a random sequence (Rdm) upstream of the Xbra-E probe (Table 1) in order to extend it 
to the same length as Xbra-WT. In contrast to SIPIczf, which bound efficiently to Rdm+Xbra-E 
probe, SIPIfs was unable to bind. This result demonstrates that length of the Xbra-E probe per se 
is not the cause of the failure of SIPIfs to bind to this probe. 

[0083] To substantiate that the Xbra-F oligonucleotide also contains sequences 
necessary for the binding of SIPIfs- We fused this oligonucleotide as well as a random sequence 
upstream of another CACCT (SEP ID NO:D site known to be bound strongly by AREB6 protein 
(Ref. 10) (probes Xbra-F + AREB6 and Rdm + AREB6, respectively). SIPIczf binds, with equal 
affinity, both the Xbra-F + AREB6 and Rdm + AREB6 probes indicating that the AREB6 
sequence is also recognized by SIPIczf- However, SIPIfs only binds to the Xbra-F + AREB6 
probe but not to Rdm + AREB6. This observation confirms that the Xbra-F oligonucleotide 
contains sequences necessary for the binding of SEPIfs- hi addition, the only common feature 
between the Xbra-E and the AREB6 probe is the CAGGTGT sequence, suggesting that no other 
sequences than this CAGGTGT in the Xbra-E probe are necessaiyjpxjheiri^^ 
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to obtain the same length as the Xbra-WT probe. In contrast to SEPIczf that binds efficiently to this 
probe, SIPIfs was unable to bind. This result shows that the Xbra-E probe's length was not the 
reason wh y SIP Us does not bind th is probe._T.O-Substantiate-that-the-Xbra-E-oligonucleotide-also 
contains sequences necessary for the binding of SIPIfs, we fused that oligonucleotide and a 
random sequence upstream of another CACCT (SEP ID NO:P site known to bind strongly 
AREB6 protein (Xbra-F + AREB6 and Rdm + AREB6, respectively). We observed that SIPczf 
binds (with equal affinity) to both the Xbra-F + AREB6 and Rdm + AREB6 probes, indicating that 
the AREB6 sequence is also recognized by SIPIczf. However, SIPl F s only binds to the Xbra-F + 
AREB6 probe and not to the Rdm + AREB6 probe. This confirms that the Xbra-F oligonucleotide 
contains sequences necessary for the binding of SIPIfs- In addition, the only common denominator 
between the Xbra-E and the AREB6 probe is the AGGTG (SEP ID NP:3) sequence, suggesting 
that no other sequences than this AGGTG (SEP ID NG:3) in the Xbra-E probe is necessary for the 
binding of SIPIfs- 

[0085] To map the sequences within Xbra-F that, in conjunction with the Xbra-E 
sequence, are required for the binding of SIPIfs, we prepared a series of probes, identical in length 
to Xbra-WT, containing adjacent triple mutations within the Xbra-F part (see, Table 1). Gnly three 
of these mutated probes (i.e., Xbra-L, Xbra-M and Xbra-N) affected the binding of SIPIfs- Indeed, 
the upstream CACCT (SEP ID NP: \) sequence, which is intact in the Xbra-F probe, was modified 
in the L, M and N probes. We also showed that SIPIfs does not bind to the Xbra-S probe, which 
contains a point mutation, changing the upstream CACCT (SEP ID NP:U into CATCT. This 
mutation is similar to the downstream AGATG mutation made within the Xbra-D probe. 

[0086] The results described above are indicative for SIPIfs contacting both CACCT 
(SEP ID NP:1) sequences in the Xbra promoter. To further investigate the importance of these 
sites, a DNA methylation interference assay was carried out. The methylation of three Gs of the 
downstream AGGTG (SEP ID NP:3) (SIPnn) and of the two Gs of the upstream CACCT (SEP ID 
NPiH(SIPup) was significantly lower in the SIPIfs bound versus unbound probe, suggesting that 
the methylation of these Gs interfered with the binding of SIPIfs- This finding strongly supports 
that these residues are essential for SIPIfs binding. It has also been observed that the methylation 
of one of the 2 Gs localized very close to the SIPno also interfered with the bindjng_of_SI£lEs--- 
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Consequently it has thus been shown that for SIPIfs two CACCT (SEP ID NO:D sequences and 
their integrity are required for DNA binding. 

[nng7] STP1 and 8EF1 require JZ-CACCT- CSEQ-ID-NQa-V sequenees-for-binding-to 

different potential candidate sites SIP1 and 8EF1 have a very similar structure with two very highly 
conserved zinc finger clusters and it is likely that these two proteins bind DNA in a similar way. 
We set out to determine whether 8EF1 also binds to the Xbra2 promoter by contacting both 
CACCT (SEP ID NO:! - ) sequences. Myc-tagged 6EF1 was expressed in CCS cells and the 
corresponding nuclear extracts were tested in EMSA with WT and a panel of mutated Xbra probes. 
5EF1 binds strongly to the Xbra-WT probe that contains both CACCT (SEP ID NP:1) sites. 
However, like SIPIfs, 8EF1 binds neither the Xbra-E probe comprising only the downstream 
CACCT (SEPIDNP:n site nor the Xbra-F probe containing only the upstream CACCT (SEP ID 
NP:1) site. In addition, the point mutation of either the upstream CACCT (SEP ID NP:1) (Xbra- 
S) or the downstream CACCT (SEP ID NP:1) site (Xbra-D) also abolished the binding of 8EF1. 
Therefore, like SIPIfs, full length 8EF1 requires also the integrity of both CACCT (SEPIDNP:!) 
sequences for binding to the Xbra2 promoter. The fact that two CACCT (SEP ID NP:D sites are 
required for the binding of SIPIfs as well as 8EF1 may be unique for the Xbra2 promoter. 
Therefore, the next question was to analyze whether two CACCT (SEP ID NP:1) sequences are 
also necessary for SIP1/8EF1 for binding to other target sites. Putative 8EF1 and SIP1 binding 
elements are present in several promoters. Pne putative 8EF1 binding element, indeed containing 
two intact and spaced CACCT (SEP ID NP:1) sites, was found within the promoter of the human 
oc4-integrin gene (23). Interestingly, both sites are contained within of E2 boxes. Mutation of these 
two CACCT sites led to the de-repression of the cc4-integrin gene expression in myoblasts, 
suggesting that 8EF1 is a repressor of oc4-integrin gene transcription (23). Since these two CACCT 
(SEP ID NP:1) sites are closely positioned in the promoter (spacing is 34 bp), we investigated 
whether both CACCT (SEP ID NP:1) sequences are required for the binding of 8EF1. For this 
purpose, a 60 bp-long probe overlapping both CACCT (SEP ED NP:1) sites of the a4-integrin 
promoter was synthesiz e d (a4I-WT) as we l l a s t wo nm l aled veisions, i .e., having a point muiaiiun 
in either the upstream (a4I-B) or the downstream CAC CT (SEP ID NP:l)_site (a4I-A), 
respectively (see l a ble 1). Ihese pr obes were tested tor binding in EMS As with CPS cell extracts 
of either 8EF1 or SIPIfs transfected cells. Both 8EF1 and SIPIfs form strong complexes with the 
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cx4I-WT probe. The 8EF1 complex was entirely supershifted with an anti-Myc antibody, 
demonstrating its specificity. Both the binding of SflPl and of 8EF1 is abolished or strongly 
affected h y a mutation of either ■the_npstream-or-the-downstream-GAGG-T- f SEQ-IB-NQ:-H- siter 
Moreover, competition experiments revealed that 50 ng of unlabeled cc4I-WT probe was sufficient 
to abolish the binding of SIP1 or 5EF1 to the ct4I-WT probe, whereas 50 ng of either unlabeled 
a4I-A or a4I-B probes were not. We concluded that both SIPIfs and 8EF1 require the integrity of 
two CACCT (SEP ID NO:P sites for binding to the promoter of the a4-integrin gene. 

[0088] We also found two closely positioned CACCT (SEP ID NO: 1) sites within the 
promoter of the human E-cadherin gene. An oligonucleotide comprising both CACCT (SEP ID 
NP:1) sites of this E-cadherin promoter was used as a probe (Ecad-WT) together with SIPIfs or 
5EF1 extracts in EMS As. Both SIPIfs as well as 5EF1 form a complex with this probe. However, 
when either the upstream (Ecad-A probe) or the downstream (Ecad-B probe) CACCT (SEP ID 
NP:1) site was mutated, the binding of SIPIfs and 8EF1 was abolished. This finding also suggests 
that the two CACCT (SEP ID NP:1) sites in this promoter represent a high affinity site for the 
binding of two-handed zinc finger/homeodomain transcription factors. 

[0089] From the alignment of the Xbra-WT, oc4I-WTand Ecad-WT probes (see Table 
1) we observed no obvious homology, except for one CACCTG (SEP ID NP:2) site and a second 
CACCT (SEP ID NP:1) site. Pur results described herein and this alignment indicate that only 
those sequences participating in the binding of either SIP l F s or 8EF1. We therefore conclude that 
for binding to target promoters, SIPIfs or 5EF1 require at least one CACCT (SEPIDNP:!) site 
and one CACCTG (SEP ID NP:2) site. 

[0090] Spacing variations and orientation of the CACCT (SEP IDNP:1) sites: Within 
the Xbra-WT, a4I-WT and Ecad-WT probes (Table 1), the spacing between the two CACCT 
(SEP ID NP:1) sequences was 24, 34, and 44 bp, respectively. Since SIPIfs and 8EF1 bind 
efficiently to these probes, this demonstrates that these proteins can accommodate spacing between 
the two CACCT (SEP ID NP:1) sites ranging from 24 bp to at least 44 bp. To further investigate 
whether the gpa^n g betw ee n the two CA CC T (S F O ID NP:U sites is an important parameter for 
binding, we generated different Xbra probes with deletions between these sites. Two mutant 
poT^p^brS^aS^ 

10 nucleotides. These probes were tested in EMS A with cell extracts from CPS cells expressing 
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either SIPIfs or 6EF1. Both SIPIfs and 5EF1 bind with equal affinity to the Xbra-WT, Xbra-B, 
Xbra-C and Xbra-U probes. As already suggested by the results shown for different promoters, this 
indicates that als o with in the same promoteLelement,_the,spacing-between-the-two-CACCT-(SEQ- 
IDNO:l) sites is not a critical parameter for the binding of these two transcription factors. 

[0091] By extensive comparison of the Xbra-WT, cc4I-WT and Ecad-WT probes, we 
observed that in the case of the Xbra-WT and oc4I-WT probes, the orientation of the two CACCT 
(SEP IDNP:1) sites is CACCT-N-AGGTG (SEP ID NO: 1 and SEP ID NO:3 separated by N) , 
whereas in Ecad-WT the orientation is AGGTG-N-CACCT (SEP ID NP:3 and SEP ID NP:1 
separated by N) . Because of the non-palindromic feature of the CACCT (SEP ID NP:1) site, these 
two sites could be assumed substantially different. However, SIPIfs and 8EF1 bind to these 
differently oriented sites with comparable affinities suggesting that SIPIfs and 8EF1 can bind 
irrespective of the orientation of the two CACCT (SEPIDNP:!) sites. 

[0092] To further investigate the orientation of the two CACCT (SEP ID NP:1) sites 
with respect to the DNA binding capacity of SIPIfs and 8EF1, additional probes were designed. 
Probe Xbra-EE contained a tandem repeat of the Xbra-E probe, whereas probe Xbra-ErE contained 
an inverted repeat of the same Xbra-E sequence. In addition, we synthesized Xbra-V, in which the 
upstream CACCT (SEP ID NP: 1) site (plus one extra base pair on each side) was replaced by the 
downstream AGGTG (SEP ID NP:3) sequence and vice versa. Finally, in the Xbra-W probe, only 
the downstream site was replaced by the upstream CACCT (SEP ID NP:1) sequence. All these 
probes were again tested in EMSAs with extracts prepared from CCS cells expressing either 
SIPIfs or 5EF1. We observed the strongest binding of SIP l F s or 5EF1 to the Xbra-EE probe. 
Therefore, SIPIfs and 5EF1 cannot bind to Xbra-E, containing a single CACCT (SEP ID NP:1) 
site, but bind strongly when this sequence is duplicated, again indicating the requirement of 2 
CACCT (SEP ID NP:1) sites. In addition, it is evident that these two sites have to be present on 
the same DNA fragment and not on two separated strands (see, below). SIP1 and 8EF1 bind to 
Xbra-ErE, also suggesting that the respective orientation of the two CACCT (SEP ID NP:1) sites 
i s not critical for binding. Furthermore, switching both the upstream and the downstream sites 
(probe Xbra-V) or replacing only the upstream site by a second copy of the downstream site (probe 
-X4^W Vdid-not-have-an^f feet-o^^ 

that neither the spacing between the two CACCT (SEP ID NP:1) sites nor the respective 
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orientation of these two sites is critical for the binding of two-handed zinc finger/homeodomain 

transcription factors in vitro. 

[0.053] Suiprismgly.-not-allX 

factors. In fact, duplication of the Xbra-F sequence, which in combination with the Xbra-E 
sequence was shown to be necessary for the binding of SIPIfs and 8EF1, is refractory to binding of 
SIPIfs and 5EF1. This suggests that the CACCT (SEP TP NO: 0 site within the Xbra-F context is 
a low affinity site and that sequences adjacent to this CACCT (SEP ID NO:!) site may optimize 
the affinity. In addition, the fact that neither the C-terminal cluster nor the N-terminal cluster can 
bind independently to the Xbra-F probe confirms the assumption that this site displays low affinity. 
In contrast, the CACCTG (SEP ID NP:2) site present in the Xbra-E probe can bind SEPIczf and 
SIPInzf, and a duplication of this element creates a high affinity-binding site for both SIPIfs and 
full length 5EF1. This suggests that the terminal G base in the downstream site may also allow to 
discrimination between a high and low affinity-binding site. However, the CACCT (SEP ID 
NP:0 site in Xbra-F may only bind one of the zinc finger clusters of SIPl F s once the other cluster 
has occupied the neighboring high affinity CACCTG (SEPIDNP^) site (in Xbra-E). To confirm 
the importance of the terminal G base residue for the binding of SIPIfs and 5EF1, we mutated the 
downstream CACCTG (SEP ID NP:2) site to CACCTA (probe Xbra-Z). The binding of SIPIfs or 
5EF1 to the Xbra-Z probe decreased strongly (compared with the Xbra-WT probe) suggesting that 
this G-base residue is important for generating a high affinity-binding site for both SIPIfs and 
5EF1. 

[0094] Finally, when Xbra-E and Xbra-F probes are mixed before adding SIPIfs or 
6EF1, no binding is observed, again indicating that both CACCT (SEP ID NP:1) sites have to be 
in the cis configuration, i.e., on the same DNA. 

[0095] SIP1 and 5EF1 bind to DNA elements containing two CACCT (SEPIDNP:!) 
sites and both of these proteins contain two clusters of zinc fingers capable of binding 
independently to CACCT (SEP ID NP:U sites. In subsequent work, we evaluated the importance 
of each zinc finger cluster for the binding of SIPIfs to DNA. Mutations destroying either the third 
or the fourth zinc finger of the N-terminal cluster of 5EF1 N zf were shown to abolish the binding of 
itfrisrdustea^ 

terminal cluster also abolished the binding of 5EF1 C zf to CACCT (SEP ID NP:1) (10). Therefore, 
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we introduced in the SIPInzf and SIPIczf clusters mutations similar to those in 8EF1. These 
mutated and wild type clusters were fused to GST and the fusions proteins were purified from 
-baeteria^We-demonstrate^h^ 

probe. However, with the same amount of purified mutant cluster/GST fusion proteins (GST- 
NZF3, GST-NZF4, GST-CZF2 and GST-CZF3), no binding to the Xbra-E probe could be detected 
with any of these fusion proteins. Indeed, these mutations also abolish the capacity of each cluster 
(SIPInzf and SIP 1 C zf) to bind independently to a CACCT (SEP ID NO: 1) site. 

[0096] We then introduced similar mutations in full size SIP1 (NZF3-Mut, NZF4-Mut, 
CZF2-Mut and CZF3-Mut), and over-expressed these SIP1 mutants in COS cell as Myc-tagged 
proteins. The expression of the different mutants was established and normalized by Western blot 
analysis using anti-Myc antibody. By means of EMS As, we observed that WT SIP1 binds strongly 
to the Xbra-WT probe, and that the SIP 1 -complex is super-shifted upon incubation with an anti- 
Myc antibody. In contrast, none of the mutant forms of full size SIP1 was able to form a SIP 1 -like 
complex or a SIP 1 super-shifted complex. The same observations were made when the aI4-WT 
probe was used as a probe. In conclusion, full size SIP1 requires the binding capacities of both 
intact zinc fingers clusters to bind to its target, which necessarily contains 2 CACCT (SEP ID 
NO:l) sites. The effect of these mutations on the repressor activity of SEP1 was tested in a 
transfection assay together using p3TP-Lux reporter plasmid. This plasmid contains three copies, 
each of which has one CACCT (SEP ID NO:D , of a sequence covering the -73 to -42 region of 
human collagenase promoter (de Groot and Kruijer, 1990). SIP1 bound to a fragment containing 
this multimerized element, but neither NZF3-Mut nor CZF3-Mut was able to bind. Gver- 
expression of SIP1 in CHC cells leads to a strong repression of the p3TP-Lux basal transcriptional 
activity. However, the repression was 6 to 7-fold lower upon over-expression of SIP1 mutants 
defective in DNA binding (NZF3-Mut or CZF3-Mut). Therefore the integrity of both zinc finger 
clusters is necessary for both the DNA-binding and optimal, Le., wild-type repressor activity of 
SIP1. 

[0097] SIP1 binds to DNA as a monomer: The observation that the integrity of both 
zinc fingers clusters is required for SIP1 binding to two CACCT (SEP ID NO:l) sequences, 
^uggestsJhat-SIPl-bm^ contac ts one such site. 

However, it can be hypothesized that SIP1 binds its target sites as a dimer implying that one of the 
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SIP1 molecules of the dimer would bind one CACCT (SEP ID NO:Q site via its N-terminal zinc 
finger cluster, while the second SIP1 molecule would contact the DNA via its C-terminal zinc 
finger_cluster. Si^ 

interacting with the DNA would then be involved in dimerization. Consequently, some 
combinations of NZF and CZF mutants should generate a dimer configuration that binds DNA. In 
none of the combinations of NZF and CZF mutations could binding to the Xbra-WT probe be 
detected. Although we cannot rule out that these mutations also affect potential dimer formation, it 
is highly unlikely that the same mutation affects both the DNA-binding capacity as well as the 
protein-protein interaction. Moreover, it is highly unlikely that two different mutants (having 
different mutations within a cluster) would behave the same. 

[0098] To address this experimentally, we used a combination of differently tagged 
SIP1 in supershift experiments in EMS As. First, we produced Myc-tagged and/or FLAG-tagged 
SIPIfs separately at comparable levels in COS cells, and confirmed that both proteins bind to 
DNA with similar affinities. The SIP1 complex generated with Myc-tagged SDP1 has a slightly 
slower migration than the FLAG-tagged complex (the Myc-tag is longer than the FLAG-tag). 
Extracts prepared from COS cells expressing similar amounts of both Myc-tagged and FLAG- 
tagged SIP1 were incubated with the Xbra-WT probe and used in EMS As. We observed the 
formation of a broad SIP1 complex that is a combination of both the fast migrating FLAG-tagged 
SIP1 complex with the slow migrating Myc-tagged SIP1 complex. Using an anti-FLAG antibody, 
only the lower part of the complex corresponding to FLAG-tagged SIP1 is super-shifted, whereas 
about 50 % of the radioactivity remains within the Myc-tagged SIP 1 complex. This indicates that 
the latter SIP1 complex is not super-shifted with the anti-FLAG antibody. Conversely, incubating 
the extract with an anti-Myc antibody super-shifted only the lower part of the complex 
corresponding to Myc-tagged SIP1 whereas 50% of the radioactivity is retained within the FLAG- 
tagged SIP1 complex. Again, this indicates that no FLAG-tagged SIP1 is super-shifted with an 
anti-Myc antibody. Using both antibodies, we observed the same two super-shifted bands, which 
correspond to the Myc-tagged and the FLAG-tagged super-shifted complex, in the upper part of 
the gel. If SIP1 dimers would be formed, then at least some heterodimers would be assembled 
from Myc-tag ged SIP1 and FLAG-tagged SIP1. However, we detected no other super-shifted band 
corresponding to a potential double super-shift, viz. super-shifted with both anti-Myc- and anti- 
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FLAG-antibodies. Hence, this experiment gave no detectable dimer formation between FLAG- 
tagged SIP1 and Myc-tagged SIP1. 
[0099] FinallyH^LAG-tag^ 

the presence of a large excess of DNA binding sites. However, co-immunoprecipitation of Myc- 
tagged SIP1 was not feasible. The reciprocal experiment, i.e., immunoprecipitating with an anti- 
Myc antibody and detection with an anti-FLAG antibody, did not show any SIP1 dimer either. 
Taken together, these observations lead us to conclude that SIP1 binds as a monomer to the Xbra- 
WT probe. 

[0100] Mutations in either the upstream or downstream CACCT (SEP ID NO:l) lead 
to ectopic activity of the Xbra2 promoter in transgenic frog embryos: SIP1 binds to the Xbral 
promoter and represses expression of endogenous Xbra2 mRNA when overexpressed in Xenopus 
embryos (Verschueren et a/., 1999). To analyze the importance of CACCT (SEP ID NO:!) 
sequences in the regulation of the Xbra2 promoter in vivo, we tested whether mutations of these 
would affect Xbral promoter activity in transgenic embryos. Xbral promoter sequences were 
fused upstream of the green fluorescent protein (GFP) gene and this reporter cassette was used for 
transgenesis. A 2.1 kb-long .YZ>ra2 promoter fragment was shown sufficient to yield the reporter 
protein synthesis in the same domain of the embryo (85% of the embryos, stage 11, n=57) as 
compared with endogenous Xbra mRNA (which is in the marginal zone) except in the organizer 
region, for which a regulatory element may be lacking in the reporter cassette tested here. 

[0101] A single point mutation within the downstream CACCT (SEP ID NP:1) site in 
the promoter, which disrupted SEPl binding (Xbra2-Mutl) and is identical to XbraD, had a severe 
effect on spatial production of the reporter protein. All embryos showed ectopic expression in the 
inner ectoderm layer. Mutations within the upstream CACCT (SEP ID NP:1) sequence (Xbra2- 
Mut4) also affected the SEPl binding. We observed in all transgenic embryos (n>30) the same 
ectopic expression as for the Xbra2-Mutl mutation. Mutation of the downstream CACCTG (SEP 
ID NG:2) to CACCTA (Xbra2-Mut2) also affects SEPl binding to such probe. This mutation, 
when introduced into the Xbral 2.1kb promoter, also led to ectopic expression of GFP mRNA in 
all transgenic embryos tested (n>30). We also tested a mutation (Xbra2-Mut3) that decreased by 3 
_bp_the_original 24 bp spacing betwe en the two CACCT (SEP ID NG:1) sequences. This mutation 
weakened the interaction of such probe with SIP1. This was also reflected in the corresponding 
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transgene embryos (n=37): while 35% of the embryos showed the same expression pattern as the 
wild type Xbra2 2.1kb promoter fragment, 65% had either patches or weak continuous expression 
-in-the-inner-ectoderm-layer. 

[0102] A nice correlation existed between the effect of these mutations on SIP 1 binding 
affinity in EMSA and the phenotype (ectopic expression of the reporter gene) and its penetrance in 
vivo, indicating the importance of the SIP1 target sites in the normal regulation of Xbral 
expression in Xenopus development (stage 11). It also suggests that a hitherto unknown Xenopus 
SIP 1 -like repressor regulates Xbra2 gene expression in vivo. In addition, it confirms that SIP 1 -like 
factors require two intact CACCT (SEP ID NO: 1) sites for regulating target promoters like Xbra2. 

[0103] SIP1 induces invasion by down regulation of E-cadherin: SDP1 binding 
represses E-cadherin promoter activity through binding on two conserved E-boxes. To elucidate 
whether SIP1 binding affects the transcriptional activity of the human E-cadherin promoter 
(-308/+41), we transiently co-expressed full-length SIP1 with E-cadherin promoter driven reporter 
constructs in the E-cadherin positive cell lines NMe (mouse), MDCK (dog) and MCF7/AZ 
(human). SIP1 expression led to an 80 % decrease of the human E-cadherin promoter activity. To 
address the binding specificity of SIP1 for the 2 conserved E-boxes, mutagenesis in either the 
upstream E-boxl (-75) or downstream E-box3 (-25) or simultaneously in both E-boxes was 
performed. When co-transfection was performed with SIP1 cDNA and the mutant E-cadherin 
promoter constructs (68), a de-repression of the human E-cadherin promoter activity was 
consistently shown. In addition, mutated SIP1 constructs, were co-transfected with the human E- 
cadherin promoter. Mutation of the N-terminal or C-terminal zinc finger clusters resulted only in a 
slight derepression of the E-cadherin promoter activity. Interestingly, co-transfection of the human 
E-cadherin promoter and a SIP1 double mutant, affected in both zinc finger clusters, resulted in a 
considerable loss of SIP 1 mediated repression of E-cadherin promoter activity. We can therefore 
conclude that SIP1 represses the E-cadherin promoter activity by binding to the 2 E-boxes and that 
the 2 zinc finger clusters are indeed needed for full repression of the E-cadherin promoter activity. 

[0104] Inducible expression of SIP1 results in dose-dependent loss of E-cadherin 
protein and mRNA. To elucidate whether SIP1 affects the endogenous E-cadherin expression 
Jev-els,J^cadhedn.positiy.e_M^C^retoff cells, with high expression of the tTA transactivatorwas 
stably transfected with a plasmid expressing a Myc 6 -tagged full-length mouse SIP1 cDNA under 
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control of a responsive tTA element. To induce SIP1, cells were grown without tetracycline for 3 
days. Analysis of E-cadherin and SIP1 expression by immunofluorescence of a representative 
clonedJransfectant.^ 

honeycomb E-cadherin expression pattern at cell-cell contacts. Western blot analysis confirmed 
these results. SIP1 induction occurred at tetracycline concentration equal or lower than 2g/ml. As 
the tetracycline concentration was gradually decreased, E-cadherin was more strongly repressed 
and this correlated inversely with SIP1 accumulation. Further, we checked if catenins, linking E- 
cadherin to the actin cytoskeleton, were influenced by SIP1 expression. Upon a Western blotting, 
neither aE-catenin nor p-catenin appeared to be affected, and this was confirmed by 
immunofluorescence. Equal amounts of total RNA of both non-induced and induced cells were 
analyzed by Northern blotting. After hybridization with an E-cadherin-specific probe, the SEP1 
expressing cells showed almost no E-cadherin mRNA expression, whereas the non-induced cells 
(-Het) expressed normal amounts of E-cadherin mRNA. These results validate those of the reporter 
assays as induction of SIP 1 expression affects endogenous E-cadherin expression through mRNA 
down-regulation. 

[0105] SIP1 expression in human carcinoma cell lines: We performed Northern blot 
analyses to examine the expression of SIP1 in a panel of E-cadherin-negative and -positive cell 
lines. To avoid possible cross-hybridizations to other members of the 5EF1 family, appropriate 
mouse and human SIP1 cDNA fragments were used as probes. We noted a clear-cut, strong 
inverse correlation between SIP1 expression and E-cadherin expression. High expression of SIP 1 
was found in human fibroblasts and the most prevalent expression of SIP1 was found in E- 
cadherin-negative carcinoma cells, reported to have a methylated E-cadherin promoter (53). As the 
expression level of SIP1 in the described cell lines is in common with snail mRNA expression in 
E-cadherin negative cell lines (66), we looked for snail expression levels in our conditional SIP1 
expressing cell line MDCK-Tetoff-SIPl. Snail expression could not be detected after SIP1 
induction. E-cadherin repression is in our cell system not snail related. 

[0106] SIP1 enhances the malignant phenotype by promoting loss of cell cell adhesion 
and invasion. As E-cadherin is a well-known invasion-suppressor molecule (47), we addressed the 
question whether SIP1 induction switches the cells to a more invasive phenoty pe. A cell 
aggregation assay was performed of non-induced versus induced MDCK-Tetoff-SIPl cells. The 
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non-induced MDCK-Tetoff-SIPl cells showed significant aggregation after 30 min, but SIP1 
induction abrogated normal cell-cell aggregation to a similar extent as an E-cadherin blocking 
antibody-DEeMA--l~Invasi 
the DECMA-1 antibody. 

[0107] SIP 1 -expression results in the reduction of unidirectional cell migration. The 
role of E-cadherin on cell migration was demonstrated by using a blocking E-cadherin with a 
specific antibody that results in a reduction of unidirectional cell migration (72). The effect of SIP1 
expression on different cell migration due to down regulation of E-cadherin was studied in a 
wound assay in the inducible MDCK-Tetoff SIP1 expressing cell line. We could demonstrate that 
induction of SIP 1 results in a lower unidirectional cell migration. Down regulation of E-cadherin 
mediated cell-cell contact results in the disturbance of unidirectional migration. 

[0108] DISCUSSION: Invasion and metastasis are believed to be the most crucial 
steps in tumor progression. Malignancy of carcinoma cells is characterized by loss of both cell-cell 
adhesion and cellular differentiation and this has been frequently reported to correlate negatively 
with E-cadherin down-regulation. Loss of E-cadherin expression has been attributed to 
transcriptional dysregulation (52, 73). We show here that the zinc finger protein SIP1 represses E- 
cadherin expression at the transcriptional level by binding to the conserved E-boxes present in the 
minimal E-cadherin promoter. The specific binding of SIP1 on the two E-boxes was confirmed by 
mutagenesis of either the zinc finger clusters of SIP1 or the E-box sequences in the E-cadherin 
promoter. Indeed, such mutations resulted in the loss of repression of the E-cadherin promoter 
activity by SIP 1. These results are compatible with the finding that comparable mutations of the E- 
boxes resulted in the up regulation of the E-cadherin promoter activity in E-cadherin-negative cell 
lines, where the wild type promoter shows low activity (Refs. 56, 58). Stable transfection of the 
transcriptional repressor SIP1 induces down regulation of E-cadherin at both mRNA and protein 
level. A wound assay demonstrates that SIP1 interferes with the unidirectional migration mediated 
by a functional E-cadherin cell-cell contact. Weaker cell-cell contact results in more multi- 
directional migration of the epithelial cells. A striking correlation between down-regulated E- 
cadherin and up-regulated SIP1 expression was seen in various human tumor cells. Finally, we 
-demonstrate-hfirMh^t^he dnwn regulation of E-cadherin due to SIP1 expression is also associated 
with a remarkable increase of the invasion capacity. Hence, SIP1 can be considered as an invasion- 
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inducer due to its binding to the E-cadherin promoter. The fact that the transciptional repressor 
Snail also specifically binds E-boxes resulting in transcriptional E-cadherin repression (66, 67) 
raised-the-question-wh^ 

mRNA up-regulation could not be detected in the conditional SEP1 expressing MDCK-Tetoff-SEPl 
cell line. These data led us to consider SIP1 as the effector of transcriptional E-cadherin repression 
in our cell system. This idea was supported by the fact that mutations of the E-boxes have a more 
extensive effect on the decrease of repression of the E-cadherin promoter when cotransfected with 
SIP1. Derepression of the E-cadherin promoter activity, when cotransfected with SIP1, is already 
detected with a single E-box mutation. For Snail cotransfection a clear derepression effect was 
only seen when more E-boxes were mutated in the human E-cadherin promoter (66). The high 
expression of SIP1 in the breast cancer cell lines MDA-MB435S and MDA-MB231 is remarkable. 
These tumor cell lines have been described to bear a hypermethylated E-cadherin promoter (53). 
However, this should not rule out an important role for SIP1 repression of the endogenous E- 
cadherin promoter. Mutations of the E-boxes reactivate the exogenous E-cadherin promoter 
activity strongly in these cell lines. Indeed, recent research made clear that many transcription 
factors function by recruiting multiprotein complexes with chromatin modifying activities to 
specific sites on DNA (74). It was already shown that another Smad-interacting transcription factor 
TGIF associates with histone deacetylase (75). DNA methylation and chromatin condensation 
could therefore act synergistically with histone deacetylation to repress gene transcription 76). 

[0109] Materials and methods - Cell Culture and reagents -The MDCK-Tetoff cell line 
was obtained from Clonetech (Palo Alto, CA). This cell line is derived from the Madin Darby 
Canine Kidney (MDCK) Type II epithelial cell line and stably expresses the Tet-off transactivator, 
tTA (77). MCF7/AZ cell line is a cell line derived from MCF7, a human mammary carcinoma cell 
line (78). The NMe cell line is an E-cadherin expressing subclone of NMuMG, an epithelial cell 
line from normal mouse mammary gland (47). MDA-MB23 1 is a human breast cancer cell line 
(ATCC, Manassas, VA). 

[0110] Plasmids: The full-size mouse SIP1 cDNA sequence was cloned into the Myc- 
tag containing pCS3 eukaryotic expressing vector derived from pCS2 (69). The resulting plasmid 
was designated "pCS3- SIPlFS". Remade et al. (68) described mutagenesis of the zinc finger 
clusters of the SIP1. For the construction of the inducible vector pUHD10.3SIPl, a ClaVXbal 
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fragment from pCS3SIPlFS was cloned into the EcoRVXbal-cut pUHDlO.3 vector (79). The Clal 
site of SIP1 fragment and the EcoRI site of the vector were blunted using Pfu polymerase 
(-Stratagenerka4olla r GA-)r4^ 

genomic DNA from the human MCF7/AZ cell line. PCR-primers used are: 5'- 
AC AAAAGAACTCAGCC AAGTG-3 ' (SEQ ID NO:[43]42) and 5'- 
CCGC AAGCTCAC AGGTGC-3 ' (SEQ ED NO:[44]43). The GC-melt kit (Clontech; Palo Alto, 
CA) was used for efficient amplification. The PCR product was blunted, kinased and then cloned 
into the pGL3basic vector (Promega; Madison, WI), which was opened at the Srfi site. By using 
the Kpnl-HindlU sites in this luciferase reporter construct, the E-cadherin promoter was also 
transferred to the pGL3enhancer vector. Mutagenesis of the E-boxes in the human E-cadherin 
promoter was performed by the QuickChange Site-Directed Mutagenesis Kit (Stratagene) using the 
following primers: 

forward primer E-boxl: 5'-gctgtggccggCAGATGaaccctcag-3' (SEQ ID NO:[45]44); 
reverse primer E-boxl : 5'-ctgagggttCATCTGccggccacagc-3' (SEQ ID NO:[46]45); 
forward primer E-box3 : 5'-gctccgggctCATCTGgctgcagc-3' (SEQ ID NO:[47]46); 
reverse primer E-box3 : 5'-gctgcagcCAGATGagccccggagc-3' (SEQ ID NO:[48]47). 

[0111] Stable transfection of cells: For stable transfection of the MDCK-Tetoff cell 
line, the LipofectAMINE PLUS™(Gibco BRL, Rockville, MD) method was used. 2000 cells were 
grown on a 75 cm 2 falcon for 24 h and then transfected with 30 ug of pUHD10.3-SIPl plasmid 
plus 3 ug pPHT plasmid. The latter is a pPNT derivative and confers resistance to hygromycin 
(80). Stable MDCK-Tetoff transfectants, MDCK-Tetoff-SIPl, were selected by hygromycin-B 
(150 units/ml) (Duchefa Biochemie, Haarlem, NL) for a period of 2 weeks. Induction of SIP 1 was 
prevented by adding tetracycline (lug/ul) (Sigma Chemicals, US). Expression of SIP1 was done 
by washing away tetracycline at the time of subcloning. Stable clones with reliable induction 
properties were identified by immunofluoresence using anti-Myc tag antibodies. 

[0112] Promoter reporter assays: MCF7/AZ cells were transiently transfected by using 
FuGENE 6 (Roche; Basel, CH). NMe and MDA-MB23 1 were transfected with the 
-bffOFECT-AMINE-(Gib^ nne was 

transiently transfected with LIPOFECTAMINEPLUS™ (Gibco BRL; Rockville, MD). For 
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transient transfection, about 200,000 cells were seeded per 10-cm 2 well. After incubation for 24 h, 
600 ng of each plasmid type DNA was transfected. The medium was refreshed 24 h after 
-transfection.-Cells_were-lysed-a^ 

Bedford, MA). Normalization of transfection was done by measuring P-galactosidase, encoded by 
the cotransfected pUT651 plasmid (Eurogentec; Seraing, BE). Luciferase substrate is added to 
each sample, For p-galactosidase detection, a chemi luminescent substrate is supplied (Tropix, 
Bedford, MA). Luciferase and P-galactosidase activity was assayed in a Topcount microplate 
scintillation reader (Packard Instrument Co., Meriden, CT). 

[0113] Northern analysis: Total RNA was isolated with the RNeasy kit (Qiagen; 
Chatsworth, CA) following the manufacturer's protocol. Total RNA (25 jag) was glyoxylated, size- 
fractionated on a 1% agarose gel and transferred onto a Hybond-N + membrane (Amersham 
Pharmacia Biotech, Rainhalm, UK). Hybridizations were performed as described before (81). The 
mouse SIP1 probe (459 bp) was generated by an EcoR-I digest of the mouse SIP1 cDNA. The 
human SIP1 probe (707 bp) was created by a Bst EII-Afotf digest on the Kiaa 0569 clone (Kazusa 
DNA Research Institute). The mouse E-cadherin probe used was a Sacl fragment (500 bp) of the 
mouse E-cadherin cDNA. Two degenerated primers: 5' 

CTTCCAGCAGCCCTACGAYCARGCNCA 3' (SEQ ID NO:[49]48) and 5' 
GGGTGTGGGACCGGATRTGCATYTTNAT 3' (SEQ ID NO:[50]49) were used to amplify a 
fragment of the dog Snail cDNA from a total cDNA population of the MDCK cell line. Cloning 
and sequencing of the amplified band revealed a 432 bp cDNA fragment. To control the amount of 
loaded RNA, a GAPDH probe was used on the same blot. We performed the quantification of the 
radioactive bands on a Phosphor Imager 425 (BioRad, Richmond, CA). 

[0114] Immunofluorescense assays and Antibodies: Cells of interest were grown on 
glass coverslips. Fixation was by standard procedures (82). The following antibodies were used: 
the rat monoclonal antibody DECMA-1 (Sigma; Irvine, UK) recognizing both mouse and dog E- 
cadherin, and the mouse anti-Myc tag antibody (Oncogene, Cambridge, MA). Secondary 
antibodies used were Alexa 488-coupled anti-rat Ig and Alexa 594-coupled anti-mouse Ig. 

[0115] Cell Aggregation Assay: Single-cell suspensions were prepared in accordance 
~wkh-an-E-eadherin-sa^ 

1.25 mM Ca 2+ under gyrotory shaking (New Brunswick Scientific, New Brunswick, NJ) at 80 rpm 
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for 30 min. Particle diameters were measured in a Coulter particle size counter LS200 (Coulter, 
Lake Placid, NY) at the start (N 0 ) and after 30 min of incubation (N 30 ) and plotted against 

percentage Jv_olume_distribution. ~ — 

[0116] Collagen Invasion Assay: Six-well plates were filled with 1.25 ml of 
neutralized type I collagen (Upstate Biotechnology, Lake Placid, NY) per well. Incubation for at 
least 1 h at 37°C was needed for gelification. Single-cell suspensions were seeded on top of the 
collagen gel and cultures were incubated at 37°C for 24 h. Using an inverted microscope 
controlled by a computer program, we counted the invasive and superficial cells in 12 fields of 
0.157 mm 2 . The invasion index expresses the percentage of cells invading the gel over the total 
numbers of cells (84). 

[0117] Wound Assay: The wound assay was performed as described before (85). 
Briefly, wounded monolayers were cultured for 24 h in serum-deprived medium in the presence or 
absence of tetracycline. We assessed cell migration by measuring the distance of the wound. 
Migration results are expressed as the average of the wound-distance. 
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[SEQUENCE LISTING 
<110> VLAAMS INTERUNIVERSITAIR INSTITUUT VOOR BIOTECHNOL 



<120> NUCLEIC ACID BINDING OF MULTI-ZINC FINGER TRANSCRIPTION FACTORS 

<130> JAR/SIP/V042 

<140> PCT/EP00/05582 
<141> 2000-06-09 

<150> 99202068.5 
<151> 1999-06-25 

<160> 50 

<170> Patentln Ver. 2.1 

<210> 1 
<211> 11 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: part of bait 
for screening 

<220> 

<221> misc_feature 
<222> (6) 

<223> n is a spacer sequence of at least 8 base pairs 
<400> 1 

cacctncacc t H 
<210> 2 

<2I1> 11 

<212> DNA 

<-2d-3>—Ar-^jr-firei.-a-l— Segues ee 

<220> 
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<223> Description of Artificial Sequence: part of bait 
for screening 



•<-2-20-> 

<221> misc__f eature 
<222> (6) 

<223> n is a spacer sequence of at least 8 base pairs 
<400> 2 

cacctnaggt g 11 

<210> 3 
<211> 11 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: part of bait for screening 
<220> 

<221> misc_feature 
<222> (6) 

<223> n is a spacer sequence of at least 8 base pairs 
<400> 3 

aggtgncacc t 11 

<210> 4 
<211> 11 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: part of bait 
for screening 

<220> 

<221> misc_f eature 

-<-2-2-2->-(-6 ) 

<223> n is a spacer sequence of at least 8 base pairs 
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<400> 4 



aggtgnaggt g 



11 



-<2T0>-5 

<211> 12 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: bipartite element 
<220> 

<221> misc_feature 
<222> (6) 

<223> n is a spacer sequence of at least 8 base pairs 



<210> 6 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: complex 
consensus sequence 

<220> 

<221> misc_feature 
<222> (16) 

<223> n is a spacer sequence of at the most 28 base pairs 



<400> 



5 



cacctncacc tg 



12 



<400> 



6 



gacaagataa gataanctca tcttc 



25 



<210> 7 



<211> 30 



<-2i-2>-DNA- 



<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer SIPl\NZF3Mut 



< 4~0"0^7 — 

ccacctgaaa gaatccctga gaattcacag 30 

<210> 8 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer SIP1 
NZF4Mut 

<400> 8 

gggtcctaca gttcatctat cagcagcaag 30 

<210> 9 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer SIP1 CZF2Mut 
<400> 9 

caccacctta tcgagtcctc gaggctgcac 30 

<210> 10 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer SIP1 
CZF3Mut 

<-4^Qt )>— 1 0 

tcctactcgc agtccatgaa tcacaggtac 30 
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<210> 11 
<211> 50 
<212> DNA 

<-24-3>— A-j^tA-fi-eia-l— S e qu e n g e — — 

<220> 

<223> Description of Artificial Sequence: probe Xbra-WT 
<400> 11 

atccaggcca cctaaaatat agaatgataa agtgaccagg tgtcagttct 50 

<210> 12 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-D 
<400> 12 

atccaggcca cctaaaatat agaatgataa agtgaccaga tgtcagttct 50 

<210> 13 

<211> 23 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-E 

<400> 13 

taaagtgacc aggtgtcagt tct 23 

<210> 14 
<211> 27 
<212> DNA 

<213> AiLificial Sequence — 

-<220 > 

<223> Description of Artificial Sequence: probe Xbra-F 
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<400> 14 

atccaggcca cctaaaatat agaatga 27 



<2i0>— 1-5 

<211> 50 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: Rdm + Xbra-E 
<400> 15 

caatttagag tactgtgtac ttgggagtaa agtgaccagg tgtcagttct 50 



<210> 16 
<211> 53 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-F + AREB6 
<400> 16 

atccaggcca cctaaaatat agaatgaggc tcagacaggt gtagaattcg gcg 53 



<210> 17 

<211> 53 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Rdm + AREB6 
<400> 17 

caatttagag tactgtgtac ttgggagggc tcagacaggt gtagaattcg gcg 



<210> 1 8 

<211> 50 

jC2.I2.^_DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: probe Xbra-J 



<400> 18 

gcacaggcca cctaaaatat agaatgataa agtgaccagg tgtcagttct 

<210> 19 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-K 
<400> 19 

atcactgcca cctaaaatat agaatgataa agtgaccagg tgtcagttct 

<210> 20 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-L 
<400> 20 

atccagtaaa cctaaaatat agaatgataa agtgaccagg tgtcagttct 

<210> 21 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-M 



<400> 21 

atccaggccc aataaaatat agaatgataa agtgaccagg tgtcagttct 

<210> 22 
<211> 50 
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<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-N 
<400> 22 

atccaggcca ccgccaatat agaatgataa agtgaccagg tgtcagttct 

<210> 23 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-0 
<400> 23 

atccaggcca cctaaccgat agaatgataa agtgaccagg tgtcagttct 

<210> 24 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-P 
<400> 24 

atccaggcca cctaaaatcg cgaatgataa agtgaccagg tgtcagttct 

<210> 25 

<211> 50 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-Q 
<400> 25 

atccaggcca cctaaaatat atcctgataa agtgaccagg tgtcagttct 
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<210> 26 



<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-R 
<400> 26 

atccaggcca cctaaaatat agaagtctaa agtgaccagg tgtcagttct 



<210> 27 
<211> 50 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-S 
<400> 27 

atccaggcca tctaaaatat agaatgataa agtgaccagg tgtcagttct 



<210> 28 
<211> 50 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-Z 
<400> 28 

atccaggcca cctaaaatat agaatgataa agtgactagg tgtcagttct 



<210> 29 



<400> 29 

atccaggcca cctatataga atgataaagt gaccaggtgt cagttct 



<210> 30 
<211> 47 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-C 
<400> 30 

atccaggcca cctaaaatat agaatgatgt gaccaggtgt cagttct 



<210> 31 
<211> 40 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-U 
<400> 31 

atccaggcca cctaaaatat agtgaccagg tgtcagttct 



<210> 32 

<211> 46 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: probe Xbra-EE 



<400> 32 

-i-aaarji-rjarr ^g gtqtcaqt tcttaaa gtg accaggtgtc agttct 



<210> 33 



<211> 46 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: probe Xbra-ErE 



<400> 33 

agaactgaca cctggtcact ttataaagtg accaggtgtc agttct 

<210> 34 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-FrF 
<400> 34 

atccaggcca cctaaaatat agaatattct atattttagg tggcctggat 

<210> 35 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Xbra-V 
<400> 35 

atccaggcag gtgtaaatat agaatgataa agtgacccac ctacagttct 

<210> 36 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Descript ion of Artificial Sequence: probe Xbra-W 
<400> 36 

<210> 37 
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<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: probe alfa4I-WT (alf a-4-integrin 
<400> 37 

gcagggcaca cctggattgc attagaatga gactcactac ccagttcagg tgtgttgcgt 60 

<210> 38 
<211> 60 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe alfa4I-A (alf a-4-integrin) 
<400> 38 

gcagggcaca cctggattgc attagaatga gactcactac ccagttcaga tgtgttgcgt 60 

<210> 39 
<211> 60 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe alfa4I-B 
(alfa-4-integrin) 

<400> 39 

gcagggcaca tctggattgc attagaatga gactcactac ccagttcagg tgtgttgcgt 60 

<210> 40 
<211> 70 

<212> DNA 

<213> Artificial Sequence ~ ~~~~ ~~ 



<220> 

<223> Description of Artificial Sequence: probe Ecad-WT 
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<400> 40 

tggccggcag gtgaaccctc agccaatcag cggtacgggg ggcggtgctc cggggctcac 60 
ctggctgcag 7 0 



<210> 41 
<211> 70 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Ecad-A 
<400> 41 

tggccggcag gtgaaccctc agccaatcag cggtacgggg ggcggtgctc cggggctcat 60 
ctggctgcag 70 

<210> 42 
<211> 70 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: probe Ecad-B 
<400> 42 

tggccggcag atgaaccctc agccaatcag cggtacgggg ggcggtgctc cggggctcac 60 
ctggctgcag 70 

<210> 43 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: PCR-primer 

<400> 4 3 ~~ ~ ' " 

acaaaagaac tcagccaagt g 21 

<210> 44 
<211> 18 
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<212> DNA 

<213> Artificial Sequence 



<220> ~ " 

<223> Description of Artificial Sequence: PCR-primer 

<400> 44 

ccgcaagctc acaggtgc 

<210> 45 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: forward primer E-boxl 
<400> 45 

gctgtggccg gcagatgaac cctcag 

<210> 46 
<211> 26 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: reverse primer E-boxl 
<400> 46 

ctgagggttc atctgccggc cacagc 

<210> 47 
<211> 24 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: forward primer 



<400> 47 
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gctccgggct catctggctg cage ^4 
<210> 48 

<211> 25 ~~ " ~~ " 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: reverse primer E-box3 
<400> 48 

gctgcagcca gatgagcccc ggagc 25 

<210> 49 
<211> 27 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerated primer 
<400> 49 

cttccagcag ccctacgayc argenca 27 

<210> 50 
<211> 28 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerated primer 
<400> 50 

gggtgtggga ccggatrtgc atyttnat 28] 
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SEQUENCE LISTING 
<1 10> VLAAMS INTERUNIVERSITAIR INSTITUUT VOOR BIOTECHNOL 



<120> Nucleic Acid Binding of Multi-Zinc Finger Transcription Factors 

<130> 2676-5 174US 

<140> US/10/028,396 
<141> 2001-12-21 

<150> 99202068.5 
<151> 1999-06-25 

<150> PCT/EP00/05582 
<151> 2000-06-09 

<160> 49 

<170> Patentln version 3.1 

<210> 1 
<211> 5 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: Portion of bait for screening 
<400> 1 

cacct 5 

<210> 2 
<211> 6 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: portion of bait for screening 



<400> 2 

cacctg 6 

<210> 3 
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<211> 5 
<212> DNA 
<213> Artificial 



<220> 

<221> miscfeature 

<223> Description of Artificial Sequence: portion of bait for screening 
<400> 3 

aggtg 5 

<210> 4 
<211> 7 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: consensus element for binding 
of MvTl. NZF-1 and NZF-3 

<400> 4 

aaagttt 7 

<210> 5 
<211> 52 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: complex consensus sequence 
<220> 

<221> misc feature 
<222> (16\.(43) 

<223> nucleotides 16-43 represent a spacer sequence wherein any one, more, 
or all of nucleotides 16-43 mv be present or absent 

<400> 5 

gacaagataa gataannnnn nnnnnnnnnn nnnnnnnnnn nmictcatct tc 52 



<210> 6 
<211> 30 
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<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: primer SIP1 NZF3Mut 
<400> 6 

ccacctgaaa gaatccctga gaattcacag 30 

<210> 7 
<211> 30 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: primer SIP 1 NZF4Mut 
<400> 7 

gggtcctaca gttcatctat cagcagcaag 30 

<210> 8 
<211> 30 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: primer SIP 1 NZF4Mut 
<400> 8 

caccacctta tcgagtcctc gaggctgcac 30 

<210> 9 
<211> 30 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature" 
<223> Description of Artificial Sequence: primer SIP 1 CZF3Mut 



<400> 9 

tcctactcgc agtccatgaa tcacaggtac 30 
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<210> 10 
<211> 50 
<212> DNA 

-<21-3>-Artificia1- — 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-WT 
<400> 10 

atccaggcca cctaaaatat agaatgataa aeteaccagg tgtcagttct 50 

<210> 11 
<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-D 
<400> 11 

atccaggcca cctaaaatat agaatgataa agtgaccaga tgtcagttct 50 

<210> 12 
<211> 23 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-E 
<400> 12 

taaagtgacc aggtgtcagt tct 23 

<210> 13 
<211> 27 
<212> DNA 
<213> Artificial 



-<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-F 
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<400> 13 

atccaggcca cctaaaatat agaatga 
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-<2iQ>-l-4 

<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Rdm + Xbra-E 
<400> 14 

caatttagag tactgtgtac ttgggagtaa agtgaccagg tgtcagttct 50 



<210> 15 

<211> 53 

<212> DNA 

<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-F + AREB6 
<400> 15 

atccaggcca cctaaaatat agaatgaggc tcagacaggt gtagaattcg gcg 53 

<210> 16 
<211> 53 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Rdm + AREB6 
<400> 16 

caatttagag tactgtgtac ttgggagggc tcagacaggt gta gaattcg gcg 53 



<210> 17 
<24-l->-50 
<212> DNA 
<213> Artificial 
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<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-J 



<400> 17 

gcacaggcca cctaaaatat agaatgataa agtgaccagg tgtcagttct 50 

<210> 18 

<211> 50 

<212> DNA 

<213> Artificial 

<220> 

<221> miscfeature 

<223> Description of Artificial Sequence: probe Xbra-K 
<400> 18 

atcactgcca cctaaaatat agaatgataa agtgaccagg tgtcagttct 50 

<210> 19 

<211> 50 

<212> DNA 

<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-L 
<400> 19 

atccagtaaa cctaaaatat agaatgataa agtgaccagg tgtcagttct 50 

<210> 20 
<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-M 

<400> 20 

atccaggccc aataaaatat agaatgataa agtgaccagg tgtcagttct 50 



<210> 21 
<211> 50 
<212> DNA 



76 



<213> Artificial 
<220> 

<2-21^ > -mise-feature - 

<223> Description of Artificial Sequence: probe Xbra-N 

<400> 21 

atccaggcca ccgccaatat agaatgataa agtgaccagg tgtcagttct 50 

<210> 22 
<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-0 
<400> 22 

atccaggcca cctaaccgat agaatgataa agtgaccagg tgtcagttct 50 

<210> 23 
<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-P 
<400> 23 

atccaggcca cctaaaatcg cgaatgataa agtgaccagg tgtcagttct 50 

<210> 24 
<211> 50 
<212> DNA 
<213> Artificial 



<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-Q 



<400> 24 ' 

atccaggcca cctaaaatat atcctgataa agtgaccagg tgtcagttct 50 
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<210> 25 



<211> 50 
<212> DNA 

<21S>-Artificial- — 

<220> 

<221> miscfeature 

<223> Description of Artificial Sequence: probe Xbra-R 
<400> 25 

atccaggcca cctaaaatat agaagtctaa agtgaccagg tgtcaRttct 50 

<210> 26 
<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-S 
<400> 26 

atccaggcca tctaaaatat agaatgataa agtgaccagg tgtcagttct 50 

<210> 27 

<211> 50 

<212> DNA 

<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-Z 
<400> 27 

atccaggcca cctaaaatat agaatgataa agtgactagg tgtcagttct 50 



<210> 28 
<211> 47 
<212> DNA 
<213> Artificial 



<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-B 
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<400> 28 

atccaggcca cctatataga atgataaagt gaccaggtgt cagttct ■ 47 



<210> 29 
<211> 47 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-C 
<400> 29 

atccaggcca cctaaaatat agaatgatgt gaccaggtgt cagttct 47 

<210> 30 
<211> 40 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-U 
<400> 30 

atccaggcca cctaaaatat agtgaccagg tgtcagttct 40 

<210> 31 
<2U> 46 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-EE 
<400> 31 

taaagtgacc aggtgtcagt tcttaaagtg accaggtgtc agttct 46 

<210> 32 

<211> 46 

<24-2->-DNA- 

<213> Artificial " 
<220> 
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<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-ErE 



<40Q>^-2 

agaactgaca cctggtcact ttataaagtg accaggtgtc agttct 46 

<210> 33 
<211> 50 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-FrF 
<400> 33 

atccaggcca cctaaaatat agaatattct atattttagg tggcctggat 50 

<210> 34 

<211> 50 

<212> DNA 

<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-V 
<400> 34 

atccaggcag gtgtaaatat agaatgataa agtgacccac ctacagttct 50 

<210> 35 
<211> 50 
<212> DNA 
<213> Artificial 



<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Xbra-W 

<400> 35 

atccaggcag gtgtaaatat agaatgataa agtgaccagg tgtcagttct 50 



<210> 36 
<211> 60 
<212> DNA 
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<213> Artificial 



<220> 

<221>~misc3feature 

<223> Description of Artificial Sequence: probe alfa-4I-WT (alfa-4-integrin) 

<400> 36 

gcagggcaca cctggattec attagaatga gactcactac ccagttcagg tgtgttgcgt 60 

<210> 37 
<211> 60 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe alfa-4I-A (alfa-4-integrin) 
<400> 37 

gcagggcaca cctggattgc attagaatga gactcactac ccagttcaga tgtgttgcgt 60 

<210> 38 
<211> 60 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe alfa4-I-B (alfa-4-integrin) 
<400> 38 

gcagggcaca tctggattgc attagaatga gactcactac ccagttcagg tgtgttgcgt 60 

<210> 39 
<211> 70 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Descnotion ot Artificial S equence: uiube Ecad-W T 

_<4fl0>_39 

tggccggcag gtgaaccctc agccaatcag cggtacgggg ggcggtgctc cggggctcac 60 



ctggctgcag 
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<210> 40 
<2U> 70 

<2-l-2>-DNA 

<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Ecad-A 
<400> 40 

tggccggcag gtgaaccctc agccaatcag cggtacgggg ggcggtgctc cggggctcat 60 

ctggctgcag 70 

<210> 41 
<211> 70 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: probe Ecad-B 
<400> 41 

tggccggcag atgaaccctc agccaatcag cggtacgggg ggc ggtgctc cggggctcac 60 

ctggctgcag 70 

<210> 42 
<211> 21 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: PCR-primer for E-cadherin promoter 
sequence (-3417+41) 

<400> 42 



acaaaagaac tcagccaagl g 
<210> 43 


_ 21 


<211> 18 




<212> DNA 




<213> Artificial 
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<220> 

<221> miscfeature 
<223>H3escription-of-Artific^ 
sequence (-341/+41) 

<400> 43 

ccecaagctc acaggtgc 18 

<210> 44 
<211> 26 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: forward primer E-boxl 



<40O 44 

gctgtggccg gcagatgaac cctcag 26 

<210> 45 
<211> 26 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: reverse primer E-boxl 
<400> 45 

ctgagggttc atctgccggc cacagc 26 

<210> 46 
<211> 24 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequ ence: fuiwaid p rim er E-box3 

<:4.00^-4.6 

gctccgggct catctggctg cage 
<210> 47 



24 
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<211> 25 
<212> DNA 
<213> Artificial 



<220> 

<221> misc feature 

<223> Description of Artificial Sequence: reverse primer E-box3 
<400> 47 

gctgcagcca gatgagcccc ggagc 25 

<210> 48 
<211> 27 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: degenerated primer 
<220> 

<221> misc feature 
<222> (25) 

<223> n is a spacer and may be any nucleotide 



<400> 48 

cttccagcag ccctacgavc argcnca 27 



<210> 49 
<211> 28 
<212> DNA 
<213> Artificial 

<220> 

<221> misc feature 

<223> Description of Artificial Sequence: degenerated primer 



<220> 

<22_l>_misc_ feature 

<2~22>T2ol 

<223> n is a spacer and may be any nucleotide 
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A If 



<400> 49 

gggtgtggga ccggatrtgc atyttnat 28 
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ABSTRACT 

A.metho.d_ofjdentifying.transcription.factorsxomprising-proyidingxe]^ 

sequence at least comprising a sequence CACCT (SEQ ID N0:[J1) as bait for the screening of a 
library encoding potential transcription factors and performing a specificity test to isolate said 
factors. Preferably, the bait comprises twice the CACCT (SEQ ID NOrQl) sequence, more 
particularly the bait comprises one of the sequences CACCT-N-CACCT_(afirsLSEQ ID NO:[JI 
and a second SEP ID NOT separated by N) , CACCT-N-AGGTG_(SEQ ID NO:[J l and SEQ ID 
NO:3 separated by N) , AGGTG-N-CACCT_(SEQ ID NO:[ J 3 and SEP ID NOT separated by N), 
or AGGTG-N-AGGTG ( a first SEP ID NO: [1 3 and a second SEP ID NO:3 separated by N), 
wherein N is a spacer sequence. The transcription factors identified using the methods of the 
invention include separated clusters of zinc fingers, such as, for example, a two-handed zinc finger 
transcription factor. Also, at least one such zinc finger transcription factor, denominated as SIP 1, 
induces tumor metastasis by down regulation of the expression of E-cadherin. Compounds 
interfering with SIP 1 activity can thus be used to prevent tumor invasion and metastasis. 
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APPENDIX D 

( SUBSTITUTE CLAIMS 
WITH MARKINGS TO SHOW CHANGES MADE) 



(Serial No. 10/028,396) 



Version with Markings to Show Changes Made 
Claims 

What is claimed is: 

1. (Amended) A process of identifying transcription factors such as activators and/or 
repressors comprising: 

providing cells with a nucleic acid sequence at least comprising a sequence CACCT ([the first 5 
nucleotides of ]SEQ ID NO: 1), preferably twice a CACCT sequence ([the first 5 
nucleotides of ]SEQ ID NO: 1), as bait(s) for the screening of a library encoding potential 
transcription factors and 

performing a specificity test to isolate said transcription factors. 

2. (Amended) A process of identifying transcription factors such as activators and/or 
repressors comprising: 

providing cells with a nucleic acid sequence comprising one of the sequences CACCT-N- 

CACCT ( a first SEQ ID NO: 1 and a second SEP ID NO: 1 separated by N ), CACCT-N- 
AGGTG (SEQ ID NO: [2] 1 and SEP ID NO:3 separated by N ), AGGTG-N-CACCT 
(SEQ ID NG: 3 and SEP ID NO: 1 separated by N\ or AGGTG-N-AGGTG ( a first SEQ 
ID NP: [4] 3 and a second SEP ID NP:3 separated by N ) as bait wherein N is a spacer 
sequence. 

3. A process according to claim 1 or claim 2 wherein the transcription factor comprises 
separated clusters of zinc fingers. 

4. A process according to claim 1, claim 2, or claim 3 wherein the sequence originates from 
a promoter region. 

-y A process according to claim 4 wher e in the promot e r region is selected from the group — 

consisting of Brachyury, a4-integrin, follistatin, and E-cadherin. 
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6. A transcription factor produced by the process of claim 1, claim 2, claim 3, claim 4, or 
claim 5. 



7. (Amended) A process for identifying compounds with an interference capability towards 
transcription factors as defined in claim 6 by 

adding a sample comprising a potential compound to be identified to a test system comprising: (i) 
a nucleotide sequence comprising one of the sequences CACCT-N-CACCT ( a first SEP 
ID NO: 1 and a second SEP ID NO: 1 separated by N ), CACCT-N-AGGTG (SEQ ID NO: 
[21 1 and SEP ID NO:3 separated by N ), AGGTG-N-CACCT (SEQ ID NO: 3 and SEP 
IDNP:1 separated bvK ), or AGGTG-N-AGGTG ( a first SEP ID NP: [4] 3 and a second 
SEP ID NP:3 separated by N ) as bait wherein N is a spacer, and (ii) a protein capable to 
bind said nucleotide sequence, 

incubating said sample in said system for a period of time sufficient to permit interaction of the 
compound or its derivative or counterpart thereof with said protein, 

comparing the amount and/or activity of the protein bound to the nucleotide sequence before and 
after said adding and 

identification and optionally isolation and/or purification of the compound. 

8. The process according to claim 7 wherein the protein is a Smad-interacting protein. 

9. The process according to claim 8, wherein said Smad-interacting protein is SEP 1. 

10. A compound produced by the process of claim 7, claim 8, or claim 9. 

1 1 . The compound of claim 1 0, wherein said compound modifies regulation of E-cadherin 
expression by SIP 1. 

12. A pharmaceutical composition to prevent tumor invasion and/or metastasis, said 
pharmace uticafcomposttmn-eompri^mg^ 
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the compound of claim 10 or claim 1 1 in an amount to prevent tumor invasion and/or metastasis 

in a subject, and 
a pharmaceutically acceptable excipient. 

13. (Amended) A test kit to perform the process of claim 7, said test kit comprising: 

a nucleotide sequence comprising a sequence selected from the group consisting of CACCT-N- 
CACCT (a first SEP ID NO: 1 and a second SEP ID NO: 1 separated by N) , CACCT-N- 
AGGTG (SEQ ID NO: [2] 1 and SEP ID NO:3 separated by N) , AGGTG-N-CACCT 
(SEQ ID NG: 3 and SEP ID NP:1 separated bvN ), and AGGTG-N-AGGTG ( a first SEP 
ID NQ: [41 3 and a second SEP ID NP:3 separated bv N). wherein N is a spacer sequence 
and 

(ii)a protein capable of binding said nucleotide sequence. 

14. (Amended) A test kit to perform the process of claim 2, said test kit comprising: 

a nucleic acid sequence comprising one of the sequences CACCT-N-CACCT ( a first SEP ID 
NP: 1 and a second SEP ID NP: 1 separated bv N) , CACCT-N-AGGTG (SEQ ID NG: 
\2] l and SEP ID NP:3 separated bvN) , AGGTG-N-CACCT (SEQ ID NG: 3 and SEQ 
IDNP:1 separated bvN) , or AGGTG-N-AGGTG (a first SEP ID NP: [4]3 and a second 
SEP ID NP:3 separated bv N) , wherein N is a spacer sequence. 

15. (Amended) A method for detecting an interaction between a first interacting protein and 
a second interacting protein comprising: 

providing a suitable host cell with a first fusion protein comprising a first interacting protein 

fused to a DNA binding domain capable to bind a nucleic acid sequence comprising one 
of the sequences CACCT-N-CACCT ( a first SEP ID NP: 1 and a second SEPIDNP:! 
separated bvN ), CACCT-N-AGGTG (SEQ ID NG: [2] 1 and SEP ID NP:3 separated by 

M) *anTG . K - r\crT fSEP ID NP: 3 and SEP ID NP:1 separated bvN) , or AGGTG- 

N-AGGTG (a first SEP ID NP: [4] 3 and a second SEP ID N P:3 separated bv N), 

wherein N"i^rspacer sequence, 
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providing said suitable host cell with a second fusion protein comprising a second interacting 
protein fused to a DNA binding domain capable to bind a nucleic acid sequence 
comprising one of the sequences CACCT-N-CACCT (a first SEP ID NO: 1 and a second 
SEP ID NO: 1 separated by N) , CACCT-N-AGGTG (SEQ ID NG: [21 1 and SEP ID 
NP:3 separated bvN ). AGGTG-N-CACCT (SEQ ID NG: 3 and SEP ID NP:1 separated 
bvN ), or AGGTG-N-AGGTG (a first SEP ID NP: [41 3 and a second SEP ID NP:3 
separated by N), wherein N is a spacer sequence, 

subjecting said host cell to conditions under which the first interacting protein and the second 

interacting protein are brought into close proximity and determining whether a detectable 
gene present in the host cell and located adjacent to said nucleic acid sequence has been 
expressed to a greater degree than if expressed in the absence of the interaction between 
the first and the second interacting protein. 

16. (Amended) An isolated nucleic acid sequence comprising a sequence selected from the 
group consisting of CACCT-N-CACCT ( a first SEP ID NP: 1 and a second SEP ID NP: 1 
separated bvN ), CACCT-N-AGGTG (SEQ ID NO: \2] 1 and SEP ID NO:3 separated by_N), 
AGGTG-N-CACCT (SEQ ED NP: 3 and SEP ID NP:1 separated bvN ), and AGGTG-N- 
AGGTG f a first SEP ED NP: [41 3 and a second SEP ED NP:3 separated bv N). wherein N is a 
spacer. 

17. (Amended) A method of identifying a new target gene, said method comprising: 
identifying said new target gene using a nucleic acid sequence, said nucleic acid sequence 

comprising a sequence selected from the group consisting of CACCT ([the first five 
nucleotides of ]SEQ ED NP: 1), CACCT-N-CACCT f a first SEP ID NP: 1 and a second 
SEP ED NP:1 separated bv N\ CACCT-N-AGGTG (SEQ ED NO: [2]1 and SEQ ED 
NO:3 separated bvN ), AGGTG-N-CACCT (SEQ ED NO: 3 and SEP ID NO:l separated 

hy Nl or AGGTG-N-A GGTG fa first SEP ID NP: T413 and a second SEQ ED NO:3 

separated by N), wherein N is a spacer. 
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